Run SeekSoulTools

Run tests

Example 1: Basic usage

Set up the necessary configuration files for the analysis, including the paths to the sample data, the chemistry versions, the genome index, the gene annotation file, etc. Run the SeekSoulTools using the following command:

seeksoultools rna run \
--fq1 /path/to/demo_dd/demo_dd_S39_L001_R1_001.fastq.gz \
--fq2 /path/to/demo_dd/demo_dd_S39_L001_R2_001.fastq.gz \
--samplename demo_dd \
--genomeDir /path/to/GRCh38/star \
--gtf /path/to/GRCh38/genes/genes.gtf \
--chemistry DDV2 \
--core 4 \
--include-introns

Example 2: Specify a different version of STAR for analysis.

To use a specific version of STAR for analysis while ensuring compatibility with the –genomeDir generated by that version, you can run the SeekSoulTools using the following command, specifying the path to the desired version of STAR:

seeksoultools rna run \
--fq1 /path/to/demo_dd/demo_dd_S39_L001_R1_001.fastq.gz \
--fq2 /path/to/demo_dd/demo_dd_S39_L001_R2_001.fastq.gz \
--samplename demo_dd \
--genomeDir /path/to/GRCh38/star \
--gtf /path/to/GRCh38/genes/genes.gtf \
--chemistry DDV2 \
--core 4 \
--include-introns \
--star_path /path/to/cellranger-5.0.0/lib/bin/STAR

Example 3: A sample has multiple sets of fastq files

If a sample has multiple sets of FASTQ data, you can provide the paths to all the FASTQ files associated with that sample when running the SeekSoulTools. Here’s an example command:

seeksoultools rna run \
--fq1 /path/to/demo_dd_S39_L001_R1_001.fastq.gz \
--fq1 /path/to/demo_dd_S39_L002_R1_001.fastq.gz \
--fq2 /path/to/demo_dd_S39_L001_R2_001.fastq.gz \
--fq2 /path/to/demo_dd_S39_L002_R2_001.fastq.gz \
--samplename demo \
--genomeDir /path/to/GRCh38/star \
--gtf /path/to/GRCh38/genes/genes.gtf \
--chemistry DDV2 \
--core 4 \
--include-introns

Example 4: Customize the structure of R1

To customize the structure of the Read 1 (R1) FASTQ files, here’s an example command:

seeksoultools rna run \
--fq1 /path/to/demo_dd_S39_L001_R1_001.fastq.gz \
--fq2 /path/to/demo_dd_S39_L001_R2_001.fastq.gz \
--samplename demo \
--genomeDir /path/to/GRCh38/star \
--gtf /path/to/GRCh38/genes/genes.gtf \
--barcode /path/to/utils/CLS1.txt \
--barcode /path/to/utils/CLS2.txt \
--barcode /path/to/utils/CLS3.txt \
--linker /path/to/utils/Linker1.txt \
--linker /path/to/utils/Linker2.txt \
--structure B9L12B9L13B9U8 \
--core 4 \
--include-introns
  • The structure of read1 is represented by B9L12B9L13B9U8, which means it consists of three sections of cell barcode, each with 9 bases, and a UMI section with 8 bases. The linker section between the cell barcode and UMI consists of two parts, with the first part being 12 bases and the second part being 13 bases

  • Use --barcode to specify the three sections of barcodes sequentially, and use --linker to specify the two sections of linkers sequentially.

Parameter descriptions

Parameters

Descriptions

–fq1

Paths to R1 fastq files.

–fq2

Paths to R2 fastq files.

–samplename

Sample name. A directory will be created named after the sample name in the outdir directory. Only digits, letters, and underscores are supported.

–outdir

Output directory. Default: ./

–genomeDir

The path of the reference genome generated by STAR. The version needs to be consistent with the STAR used by SeekSoulTools.

–gtf

Path to the GTF file for the corresponding species.

–core

Number of threads used for the analysis.

–chemistry

Reagent type, with each type corresponding to a combination of --shift, --pattern, --structure, --barcode, and --sc5p. Available options: DDV2, DD5V1, MM, MM-D.
DDV2 corresponds to the SeekOne® DD Single Cell 3’ Transcriptome-seq Kit.
DD5V1 corresponds to the SeekOne® DD Single Cell 5’ Transcriptome-seq Kit.
MM corresponds to the SeekOne® MM Single Cell Transcriptome Kit.
MM-D corresponds to the SeekOne® MM Large-well Single Cell Transcriptome-seq Kit.

–skip_misB

If enabled, no base mismatch is allowed for barcode. Default is 1.

–skip_misL

If enabled, no base mismatch is allowed for linker. Default is 1.

–skip_multi

If enabled, discard reads that can be corrected to multiple white-listed barcodes. Barcodes are corrected to the barcode with the highest frequency by default.

–expectNum

Estimated number of captured cells.

–forceCell

When number of cells obtained from analysis is abnormal, add this parameter with expected value N. SeekSoulTools will select the top N cells based on UMI from high to low.

–include-introns

When disabled, only exon reads are used for quantification. When enabled, intron reads are also used for quantification.

–star_path

Path to another version of STAR for alignment. The version must be compatible with the --genomeDir version. The default --star_path is the STAR in the environment.