郑一丁 郑一丁 综合讨论组

WEGENE的数据能直接导入23andme吗?

如何操作
2016-10-08 • IP属地中国
按热门排序    按默认排序

12 个回复

田园牧歌 - 追本溯源 · 微基因
不能~ 看看哪位高手写个macro来让我们可以把数据格式转换成23andme或其他公司的?
田园牧歌 - 追本溯源 · 微基因
网上好像已经有23andme, ancestry, ftdna互转的macro了 高手们快出现~~
如果谁能描述一下wegene的数据格式,也许我能写出一个换格式的脚本。下面是我见过的23andMe的v3版数据格式:
 
以#开始的评论行:说明等,共15行。其后面的数据列用Tab隔开。
数据列1: SNP的rsid (如rs4477212、i5004018)
数据列2: SNP的染色体标号 (顺序为:1-22、X、Y、MT)
数据列3: SNP的位置 (reference human assembly build 37)
数据列4: 基因型 (A、C、G、T、I、D;染色体1-22,两个字母;染色体X、Y、MT,一个字母;所有未测到的都用--表示)
 
    在dna.land网站上见到这样一个问题和回答:
“I have a 23andMe formatted file from Genes For Good. Why can’t I upload it to DNA.LAND?”
“At this time, we are only able to accept files directly from Ancestry, FTDNA, and 23andMe. We look forward to welcoming other communities into DNA.LAND at some point in the future!”
看起来,他们不光要求格式,还要对位点。
    另外,网站解释他们的方法时说:
“DNA.Land imputes your genome, which opens the possibility of seeing genetic variations that were not part of the original file. It is similar to getting whole genome sequencing data (albeit we still miss some rare variations) without investing thousands of dollars. ”
好像是用一个人在给定系列的位点的SNP加上公共数据库的数据去做一个模拟的全基因组序列。这大概依赖于他们已经研究过的给定系列的位点。
求高手出现!
田园牧歌 - 追本溯源 · 微基因
# This data file generated by WeGene at: Thu, 22 Sep 2016 14:17:00
#
# This file contains the genotype called by WeGene with our
# internal quality control pipeline. The low quality sites
# were discarded. If we could not determine the genotypes,
# it will be —-. As such, the call rate and accuracy may not
# be one hundred percent. So this data is suitable only for
# research, educational, and informational use and not for
# medical or other use.

# This text file is a list of your data which are TAB-
# separated. Each line corresponds to a single SNP or short
# InDel (insertion or deletion).
# For each SNP or short InDel, we provide its identifier (an
# rsid or an internal id), its location on the reference human
# genome (human assembly build 37, GRCh37) and the genotype
# call oriented with respect to the plus strand on the human
# reference sequence. For consistency, the genotypes are always
#  two base pairs, including hemizygous calls.

# Please note, as our ability to call genotypes improves, it is
# possible that your data may be slightly different at different
#  times.

# rsid    chromosome    position    genotype
rs8179414    1    565400    CC
rs9701055    1    565433    --
rs9645428    1    566810    GG
......
# This data file generated by 23andMe at: Sat Nov 24 12:23:14 2012
#
# Below is a text version of your data. Fields are TAB-separated
# Each line corresponds to a single SNP.  For each SNP, we provide its identifier
# (an rsid or an internal id), its location on the reference human genome, and the
# genotype call oriented with respect to the plus strand on the human reference
# sequence.     We are using reference human assembly build 37.  Note that it is possible
# that data downloaded at different times may be different due to ongoing improvements
# in our ability to call genotypes. More information about these changes can be found at:
# https://www.23andme.com/you/download/revisions/
#
# More information on reference human assembly build 37:
# http://www.ncbi.nlm.nih.gov/projects/mapview/map_search.cgi?taxid=9606&build=37
#
# rsid chromosome position genotype
 
# This data file generated by 23andMe at: Mon Nov 21 23:41:25 2016
#
# This file contains raw genotype data, including data that is not used in 23andMe reports.
# This data has undergone a general quality review however only a subset of markers have been
# individually validated for accuracy. As such, this data is suitable only for research,
# educational, and informational use and not for medical or other use.
#
# Below is a text version of your data.  Fields are TAB-separated
# Each line corresponds to a single SNP.  For each SNP, we provide its identifier
# (an rsid or an internal id), its location on the reference human genome, and the
# genotype call oriented with respect to the plus strand on the human reference sequence.
# We are using reference human assembly build 37 (also known as Annotation Release 104).
# Note that it is possible that data downloaded at different times may be different due to ongoing
# improvements in our ability to call genotypes. More information about these changes can be found at:
# https://www.23andme.com/you/download/revisions/
#
# More information on reference human assembly build 37 (aka Annotation Release 104):
# http://www.ncbi.nlm.nih.gov/mapview/map_search.cgi?taxid=9606
#
# rsid chromosome position genotype
 
田园牧歌 - 追本溯源 · 微基因
之前老大说过人家不鸟咱们因为人数太少...
    这个网页(http://www.math.mun.ca/~dapike/FF23utils/)的第一项 Search for Runs of Homozygosity (ROHs) 计算常染色体对上的相同片段的长度。因为芯片上的SNP不全,所以结果是近似的而且与芯片的版本有关。只有拿不同的人用同一种芯片得出的结果才容易直接比较。
    这个计算给出一种近亲程度分析(一个人的近祖是否来自关系紧密的人群)。曾经有不少人在23andMe的论坛里比较过结果。(第一项自己的网页上链接了几篇参考文献。)
    按这个网页上说的,好像要用wegene的数据的话可以把文件的第一行改成这样 - beginning with # and containing the string "23andMe"。如果真能算,可以开一个新帖比较结果。
 

要回复问题请先登录注册