profile

Hi! I'm Tommy Tang

Bioinformatics? You do not need machine learning most of the time

Published 2 months agoΒ β€’Β 1 min read

Hello Bioinformatics lovers,

I know many of you want to use machine learning in bioinformatics. While it sounds exciting, the mundane work for bioinformatics is usually data cleaning, wrangling, and plotting.

Mastering the basic skills of those can already get you very far in terms of solving practical bioinformatics problems.

I am not saying machine learning is not important. I am also learning it and keeping myself in the front of deep learning.

While the world is duped with more complicated methods, I prefer simpler methods for biological data. Examples πŸ‘‡

1. β€œOur results highlight the efficacy of simple methods, especially the Wilcoxon rank-sum test, Student’s t-test, and logistic regression.” A comparison of marker gene selection methods for single-cell RNA sequencing data​

Read my old post if you are interested: marker gene selection using logistic regression and regularization for scRNAseq.​

2. β€œWe compared 22 transformations, conceptually grouped into four basic approaches, for their ability to recover latent structure among the cells. We found that one of the simplest approaches, the shifted logarithm transformation with y0 = 1 followed by PCA, performed surprisingly well” Comparison of transformations for single-cell RNA-seq data ​

3. Exaggerated false positives by popular differential expression methods when analyzing human population samples Wilcox rank sum test beats DEseq2 :) when you have hundreds of samples.

4. GC content predicts expression better than a foundation Large language model :)

I covered it in my video: Three misbeliefs of being a computational biologist

video preview​

​

Other resources:

  1. ​Detecting Somatic Mutations Without Matched Normal Samples Using Long Reads​
  2. If you have bulk RNAseq data, this is a very nice interactive dashboard to share with wet biologists https://degust.erc.monash.edu/​
  3. ​Modeling single cell RNAseq data with multinomial distribution​
  4. ​scAbsolute: measuring single-cell ploidy and replication status​
  5. ​Master Bioinformatics RNAseq Analysis from Scratch: A Beginner's Guide​
  6. ​From GEO fastq to counts for bulkRNAseq​

Happy Learning!

Tommy

​

​

​

​

​

Let's connect on twitter and Linkedin!

​

​

Hi! I'm Tommy Tang

I am a computational biologist with six years of wet lab experience and over ten years of computation experience. I will help you to learn computational skills to tame astronomical data and derive insights. Check out the resources I offer below and sign up for my newsletter! https://github.com/crazyhottommy/getting-started-with-genomics-tools-and-resources

Share this page