Research
Starting August 1st, 2022, I joined Dr. Daniel Levy's lab at NHLBI as a staff scientist. For those who know me, this is an interesting development; for those who don't,
here is a report in which I am a second feature.
Here is a piece in New Yorker by Gideon Lewis-Kraus about Franklin Tao.
Here is a recent report in Science by Jeffray Mervis.
Previously when I was a principal investigator, my research focused on methods development in statistics and genomics.
My best work in statistics include quantifying null distribution of Bayes factors and Bayesian variable selection regression. My best work in genomics include:
- Local ancestry inference
We model two scales of LD using a two-layer cluster model to infer local ancestry of admixed samples (designed for humans but has been used for animals and plants as well). ELAI can 1) work with both haplotypes and diplotypes; 2) allows missing data; and 3) detect ancestry track length of a few tenths of a centimorgan.
- Haplotype-phenotype association
Using the two-layer model, we can compute local haplotype sharing (LHS) between cohort individuals, and link phenotypes to LHS at each marker to perform association mapping. The hapQTL can: 1) directly works with diplotypes; and 2) avoid arbitrariness in specifying haplotypes.
- Prenatal screening
Since early pregnancy, the mother’s peripheral blood contains fragments of cell-free DNA that originated from the fetus. Thus, by sequencing mother’s cell-free DNA early in pregnancy, genetic abnormalities of the fetus, such as trisomy, can be noninvasively screened.
- Nubeam sequencing analysis
Nubeam (nucleotide be a matrix) use matrices to represent nucleotides and product of matrices to represent reads, and assign numbers to reads basd on the product matrices. Nubeam turns a set of reads to an empirical distribution and genetic difference between two samples can be quantified by the difference between empirical distributions. Many applications can be derived from these features.
In Dr. Daniel Levy's lab, I will continue developing statistical and computational methods, but focusing on analyzing rich multi-omic datasets from Framingham Heart Study and other TOPMed studies.