2 诸暨市翠溪生物技术研究院, 诸暨, 311800;
3 海南省热带农业资源开发利用研究所, 三亚, 572025;
4 广西大学生命科学与技术学院, 南宁, 530005
作者 通讯作者
基因组学与生物技术, 2017 年, 第 6 卷, 第 1 篇
收稿日期: 2017年02月28日 接受日期: 2017年03月28日
本研究采用比较基因组学和基因本体论原理,利用大豆及其豆科模式植物蒺藜苜蓿、百脉根基因组数据,在全基因组水平上鉴定豆科植物中ELF3/ELF4,CCR2,CRB,LWD1/LWD2,FIO1,LUX,ZIK4,LIP1,SFR6,ARR3/ARR4和TEJ共11个生物钟调控途径相关的候选基因。根据鉴定的直系同源基因的候选基因构建分子进化树,并结合功能结构域和多序列联配,对这些候选基因的功能,进化趋势进行了分析。分析表明,生物钟调控途径在拟南芥和豆科的3个物种中发生了不同程度的分化,百脉根和蒺藜苜蓿和拟南芥的生物钟调控网络相关基因的分化程度要显著大于大豆和拟南芥之间的分化程度,这可能是由于大豆在进化过程中存在不少的基因组重组进化事件的缘故。
在大多数高等植物中,其内在的生物钟机制几乎参与调控植物体所有的新陈代谢、生长发育过程,明显的控制着自身的许多生理生化反应(李宗飞等, 2015; Covington et al., 2008)。植物叶片的运动,气孔的闭合,胚轴的延伸,尤其是受光周期调控的花期都是由植物内在的生物钟控制的(Barak et al., 2000; Mcclung et al., 2001),它可以协调各种生理活动,使得体内的包括生长发育在内的各种生理活动能够在适当的时机进行(Harmer et al., 2000; Mcclung et al., 2001)。
以往对生物钟调控网络的研究主要集中在模式植物拟南芥上,在其他植物中相关的研究还比较少见。在拟南芥己经鉴定过多个生物钟调控相关的基因,并且揭示了一些生物钟调控生理途径的机制。其中大多数生物钟调控网络相关的基因是编码转录因子的,还有一些激酶和磷酸酶(Dunlap, 2004; Edery, 2005; Toh, 2001)等。它们大部分与植物的光信号途径有关。
在本研究中,蒺藜苜蓿和百脉根是豆科中的模式植物,而大豆是重要的经济作物,这3个物种已先后完成了全基因组测序,可以为我们的分析提供较为完整的基因组数据。在模式植物拟南芥中,我们通过主要基因本体论(gene ontology, GO)查找拟南芥生物钟的调控因子,同时根据相关的文献进行补充和佐证。然后将得到的调控因子在全基因组水平上与蒺藜苜蓿,百脉根和大豆这三个物种进行比对,进行相应的生物信息学分析。
1结果与分析
1.1 ELF3/ELF4在豆科中同源候选基因的鉴定
ELF3在CDD,Prosite,Pfam-A类数据库中均没有搜到相关的保守功能结构域。但是从Pfam-B类数据库和多序列连配中,我们可以看出,ELF3有4个较为保守的结构域。而ELF4有一个DUF的保守结构域(图1)。
图 1 ELF3/ELF4氨基酸序列比对, 蛋白保守结构域分析,分子进化树和进化距离 注: A: ELF3氨基酸序列比对; B: ELF4 蛋白保守结构域分析; C: ELF4氨基酸序列比对; D: ELF3/ELF4蛋白的分子进化树; E: ELF3/ELF在4个物种之间的进化距离 Figure 1 ELF3/ELF4 protein sequences alignment, conservation domain analysis of protein, molecular evolutionary tree and the distance between four species Note: A: ELF3 protein sequences alignment; B: Conservation domain analysis of ELF4 protein; C: ELF4 protein sequences alignment; D: Molecular evolutionary tree of ELF3/ELF4 proteins; E: The distance of ELF3/ELF4 protein between four species |
在蒺藜苜蓿(Medicago truncatula)中,ELF3没有找到直系同源候选基因,ELF4有1个直系同源基因,相似性为27.0%。在百脉根(Lotus japonicas)中,ELF3有2个直系同源候选基因片段,而ELF4有1个直系同源基因片段。在大豆(Glycine max Linn. Merr)中,ELF3有6个直系同源候选基因,其序列相似性普遍高于蒺藜苜蓿和百脉根中的ELF3直系同源候选基因。并且ELF3a,ELF3b和ELF3c这3个相似性在30%以上的候选基因都有高质量的PUT的支持(表1)。
表 1 ELF3/ELF4在豆科3个物种中的直系同源候选基因 注: F代表整个候选基因有PUT序列覆盖,或者两端有PUT序列覆盖并且整个基因80%的序列被PUT序列覆盖; N表示该基因找不到符合要求的配比PUT序列; E则表示该基因的某个部分或者几个部分有符合要求的PUT序列覆盖,但是还没有达到F的标准 Table 1 The ortholog candidate genes of LF3/ELF4 in three species of Legumes Note: F: the whole candidate gene was covered by PUT sequence, or at both ends were covered by PUT sequence and the 80% of whole gene was covered by PUT sequence; N: the gene could not find a line with the suitable PUT sequence; E: one part or several parts of the gene were covered by the suitable PUT sequence, but had not reached the standard of F |
1.2 CCR2在豆科中同源候选基因的鉴定
CCR2在蒺藜苜蓿中有2个直系同源候选基因,与CCR2的相似性在70%左右,其中MtCCR2a的全长都有PUT序列覆盖。CCR2在百脉根中也有2个直系同源基因,2个基因的全长都有PUT序列覆盖,相似性依次为64.0%和66.8%。CCR2在大豆中有4个直系同源基因,相似性在50%至70%之间,其中有3个基因的全长有PUT序列覆盖(表2, 图2)。
表 2 CCR2在豆科3个物种中的直系同源候选基因 注: F代表整个候选基因有PUT序列覆盖,或者两端有PUT序列覆盖并且整个基因80%的序列被PUT序列覆盖; E则表示该基因的某个部分或者几个部分有符合要求的PUT序列覆盖,但是还没有达到F的标准 Table 2 The ortholog candidate genes of CCR2 in three species of Legume Note: F: the whole candidate gene was covered by PUT sequence, or at both ends were covered by PUT sequence and the 80% of whole gene was covered by PUT sequence; E: one part or several parts of the gene were covered by the suitable PUT sequence, but had not reached the standard of F |
图 2 CRR2氨基酸序列比对, 蛋白保守结构域分析,分子进化树和进化距离 注: A: CRR2蛋白保守结构域分析; B: CRR2氨基酸序列比对; C: CRR2蛋白的分子进化树; D: CRR2在4个物种之间的进化距离 Figure 2 CRR2 amino acid sequences alignment, conservation domain analysis of protein, molecular evolutionary tree and evolutionary distance Note: A: Conservation domain analysis of CRR2 protein; B: CRR2 amino acid sequences alignment; C: Molecular evolutionary tree of CRR2 proteins; D: The evolutionary distance of CRR2 protein between four species |
1.3 CRB在豆科中直系同源候选基因鉴定
CRB在百脉根有1个直系同源候选基因,在大豆中有2个直系同源候选基因,这3个基因均在各自体内有全长表达,并且CRB在拟南芥和百脉根,大豆中的保守性较高,其相似性在80%以上(表3, 图3)。
表 3 CRB在豆科3个物种中的直系同源候选基因 注: F代表整个候选基因有PUT序列覆盖,或者两端有PUT序列覆盖并且整个基因80%的序列被PUT序列覆盖 Table 3 The orthologous candidate genes of CRB in three species of legume Note: F: The whole candidate gene was covered by PUT sequence, or at both ends were covered by PUT sequence and the 80% of whole gene was covered by PUT sequence |
图 3 CRB蛋白的分子进化树和进化距离 注: A: CRB蛋白的分子进化树; B: 4个物种之间CRB蛋白的进化距离 Figure 3 Molecular evolutionary tree and evolutionary distance of CRB proteins and the distance of CRB protein between four species Note: A: Molecular evolutionary tree of CRB proteins; B: The evolutionary distance of CRB protein between four species |
1.4 LWD1/LWD2在豆科中直系同源候选基因的鉴定
LWD1,LWD2蛋白质在中间靠近C端有1个WD40保守结构域。LWD2在豆科中的3个物种中均没有找到直系同源基因,可能在豆科的长期的进化中丢失了。LWD1在拟南芥和豆科3个物种中的保守性非常高,MtLWD1,LjLWD1,GmLWD1a和GmLWD1b与LWD1的相似性一次为90.6%,91.2%,92.4%,92.1%,并且除了MtLWD1外,该基因在百脉根和大豆均全长表达(表4, 图4)。
表 4 LWD1/LWD2在豆科3个物种中的直系同源候选基因 注: F代表整个候选基因有PUT序列覆盖,或者两端有PUT序列覆盖并且整个基因80%的序列被PUT序列覆盖; E则表示该基因的某个部分或者几个部分有符合要求的PUT序列覆盖,但是还没有达到F的标准 Table 4 The ortholog candidate genes of LWD1/LWD2 in three species of Legume Note: F: the whole candidate gene was covered by PUT sequence, or both ends were covered by PUT sequence and the 80% of whole gene was covered by PUT sequence; E: one part or several parts of the gene were covered by the suitable PUT sequence, but had not reached the standard of F |
图 4 LWD1/LWD2氨基酸序列比对, 蛋白保守结构域分析,分子进化树和进化距离 注: A: LWD1/LWD2蛋白保守结构域分析; B: LWD1/LWD2氨基酸序列比对; C: LWD1/LWD2蛋白的分子进化树; D: LWD1/LWD2在4个物种之间的进化距离 Figure 4 LWD1/LWD2 amino acid sequences alignment, conservation domain analysis of protein, molecular evolutionary tree and the evolutionary distance Note: A: Conservation domain analysis of LWD1/LWD2 protein; B: LWD1/LWD2 amino acid sequences alignment; C: Molecular evolutionary tree of LWD1/LWD2 proteins; D: The evolutionary distance of LWD1/LWD2 between four species |
1.5 FIO1在豆科中同源候选基因的鉴定
FIO1在蒺藜苜蓿没有找到直系同源基因,在百脉根中有1个直系同源候选基因片段,但是这个基因片段在百脉根体内没有EST表达数据表达,很有可能是一个假基因。据此我们推测FIO1基因在苜蓿属的长期进化过程中,可能已经退化。而在大豆中有2个直系同源候选基因,均是全长,并且一个有EST表达,另外一个是全长表达。这表明在大豆中FIO1在大豆内还行使着正常的功能,并且从相似性和EST表达数据来看,GmFIO1b行使生物钟功能的可能性较大(表5, 图5)。
表 5 LHY在豆科3个物种中的直系同源候选基因 注: F代表整个候选基因有PUT序列覆盖,或者两端有PUT序列覆盖并且整个基因80%的序列被PUT序列覆盖; N表示该基因找不到符合要求的配比PUT序列; E则表示该基因的某个部分或者几个部分有符合要求的PUT序列覆盖,但是还没有达到F的标准 Table 5 The ortholog candidate genes of LHY in three species of legume Note: F: the whole candidate gene was covered by PUT sequence, or both ends were covered by PUT sequence and the 80% of whole gene was covered by PUT sequence; N: the gene could not find a line with the suitable PUT sequence; E: one part or several parts of the gene were covered by the suitable PUT sequence, but had not reached the standard of F |
图 5 FIO1蛋白保守结构域分析和氨基酸序列比对 注: A: 蛋白保守结构域分析; B: 氨基酸序列比对 Figure5 Conservation domain analysis and amino acid sequences alignment of FIO1 protein Note: A: Conservation domain analysis of protein; B: Amino acid sequences alignment |
1.6 LUX在豆科中同源候选基因的鉴定
LUX在蒺藜苜蓿和百脉根中各有一个直系同源基因,都有EST表达数据,相似性分别为45.7%和42.3%。LUX在大豆中有2个直系同源基因,其中GmLUX2有全长PUT序列支持(表6)。
从分子进化树种进化距离中(图6),我们可以得出,GmLUX1和GmLUX2分离的年代较晚。
表 6 LUX在豆科3个物种中的直系同源候选基因 注: F代表整个候选基因有PUT序列覆盖,或者两端有PUT序列覆盖并且整个基因80%的序列被PUT序列覆盖; E则表示该基因的某个部分或者几个部分有符合要求的PUT序列覆盖,但是还没有达到F的标准
Table 6 The ortholog candidates genes of LUX in three species of legume
Note: F: the whole candidate gene was covered by PUT sequence, or both ends were covered by PUT sequence and the 80% of whole gene was covered by PUT sequence; E: one part or several parts of the gene were covered by the suitable PUT sequence, but had not reached the standard of F |
图 6 LUX氨基酸序列比对, 分子进化树和进化距离 注: A: LUX氨基酸序列比对; B: LUX蛋白的分子进化树; C: LUX在4个物种之间的进化距离 Figure 6 LUX amino acid sequences alignment, conservation domain analysis of protein, molecular evolutionary tree and the evolutionary distance Note: A: LUX amino acid sequences alignment; B: Molecular evolutionary tree of LUX proteins; C: The distance of LUX between four species |
1.7 ZIK4在豆科中直系同源候选基因的鉴定
ZIK4在蒺藜苜蓿中有3个直系同源候选基因,与ZIK4相似性在50%以上,都有EST表达数据。ZIK4在百脉根有2个直系同源候选基因,与ZIK4相似性也在50%以上,也都有EST表达数据。
ZIK4在大豆有5个直系同源候选基因,其中,GmZIK4a与ZIK4相似性也在60%以上,有EST表达数据,而GmZIK4c与ZIK4相似性在50%以下,GmZIK4c与GmZIK4b的全长均有PUT序列覆盖。在大豆的5个直系同源基因之中,从ZIK4构建的分子进化树来看,豆科3个物种均有一个ZIK4的旁系同源基因。它们分别是LjZIK4b,MtZIK4c,GmZIK4e。它们的存在表明,ZIK4在豆科和十字花科分离之前,就已经发生了分化并且ZIK4的另一个同源基因已经在拟南芥中发生了丢失(表7, 图7)。
表 7 ZIK4在豆科3个物种中的直系同源候选基因 注: F代表整个候选基因有PUT序列覆盖,或者两端有PUT序列覆盖并且整个基因80%的序列被PUT序列覆盖; E则表示该基因的某个部分或者几个部分有符合要求的PUT序列覆盖,但是还没有达到F的标准 Table 7 The orthologous candidate genes of ZIK4 in three species of Legume Note: F: the whole candidate gene was covered by PUT sequence, or both ends were covered by PUT sequence and 80% of the whole gene sequences were covered by PUT sequence; E: a certain part or several parts of the gene were covered by suitable PUT sequence, but had not reached the standard of F |
图 7 ZIK4氨基酸序列比对, 蛋白保守结构域分析,分子进化树和进化距离 注: A: ZIK4蛋白保守结构域分析; B: ZIK4氨基酸序列比对; C: ZIK4蛋白的分子进化树; D: ZIK4在4个物种之间的进化距离 Figure 7 ZIK4 amino acid sequences alignment, protein conserved domain analysis, molecular evolutionary tree and the evolutionary distance Note: A: Protein conserved domain analysis of ZIK4; B: ZIK4 amino acid sequences alignment; C: Molecular evolutionary tree of ZIK4 protein; D: The evolutionary distance of ZIK4 among four species |
ZIK4在十字花科和豆科中的进化历史可能如下:ZIK4在十字花科和豆科分离之前,复制为ZIK4A和ZIK4B,十字花科和豆科分离之后,ZIK4B在拟南芥中发生丢失,而ZIK4A和ZIK4B则在豆科中得以保存,并且ZIK4A在苜蓿属和大豆属分开之前,复制为ZIK4A1和ZIK4A2,随后,苜蓿属和大豆属发生分离,ZIK4A2在百脉根中发生丢失,而ZIK4A1得以保存,为LjZIK4a,ZIK4A1和ZIK4A2在蒺藜苜蓿均得以保留,分别为MtZIK4a和MtZIK4b。而在大豆中,ZIK4A1和ZIK4A2分别再发生一次复制,ZIK4A1复制为GmZIK4b和GmZIK4c,ZIK4A2复制为GmZIK4a和GmZIK4d。而ZIK4B在豆科和十字花科分化后,至少发生了一次复制,随后,有发生了基因丢失。所以在豆科的3个物种中只保存了单拷贝,分别为MtZIK4c,LjZIK4b,GmZIK4e。
1.8 LIP1在豆科中直系同源候选基因的鉴定
LIP1在蒺藜苜蓿有一个直系同源候选基因,与LIP1的相似性为57.3%,有EST表达数据。LIP1在百脉根只有一个直系同源候选基因片段,也有EST表达数据。LIP1在大豆中有2个直系同源候选基因,都有全长PUT序列。其中GmLIP1b与LIP1的相似性高达75.5%(表8, 图8)。
表 8 LHY在豆科3个物种中的直系同源候选基因 注: F代表整个候选基因有PUT序列覆盖,或者两端有PUT序列覆盖并且整个基因80%的序列被PUT序列覆盖; E则表示该基因的某个部分或者几个部分有符合要求的PUT序列覆盖,但是还没有达到F的标准 Table 8 The orthologous candidate genes of LHY in three species of Legume Note: F: the whole candidate gene was covered by PUT sequence, or both ends were covered by PUT sequence and 80% of the whole gene sequences were covered by PUT sequence; E: a certain part or several parts of the gene were covered by suitable PUT sequence, but had not reached the standard of F |
图 8 LIP1氨基酸序列比对, 蛋白保守结构域分析,分子进化树和进化距离 注: A: LIP1蛋白保守结构域分析; B: LIP1氨基酸序列比对; C: LIP1蛋白的分子进化树; D: LIP1在4个物种之间的进化距离 Figure 8 LIP1 amino acid sequences alignment, conservation domain analysis of protein, molecular evolutionary tree and the evolutionary distance Note: A: Conservation domain analysis of LIP1 protein; B: LIP1 protein sequences alignment; C: Molecular evolution tree of LIP1 proteins; D: The distance of LIP1 protein between four species |
1.9 SFR6在豆科中直系同源候选基因的鉴定
SFR6基因在蒺藜苜蓿和百脉根中各有一个直系同源候选基因,其中,MtSFR6为全长,与SFR6的相似性为31.7%,但是没有EST表达数据,而LjSFR6为部分序列,也没有EST表达数据。
SFR6基因在大豆中有3个直系同源候选基因,都是全长,并且都有EST表达数据,GmSFR6a和GmSFR6c与SFR6的相似性较高。但是从氨基酸需类比对图来看GmSFR6b在C端发生很多大段序列的缺失,我们据此推测GmSFR6b可能已经丧失了其在生物钟的基本功能,而有沦为假基因的趋势,而GmSFR6c也在C端有一段序列的缺失,可能也会发生功能上一定程度的改变(表9)。
从构建SFR6的分子进化树来看,MtSFR6在最外层,表明MtSFR6是SFR6的旁系同源基因,而大豆的3个直系同源候选基因中,却并没有发现SFR6的旁系同源基因(图9)。
表 9 SFR6在豆科3个物种中的直系同源候选基因 注: N表示该基因找不到符合要求的配比PUT序列; E则表示该基因的某个部分或者几个部分有符合要求的PUT序列覆盖,但是还没有达到F的标准 Table 9 The orthologous candidate genes of SFR6 in three species of Legume Note: N: the gene could not find a line with the suitable PUT sequence; E: one part or several parts of the gene were covered by the suitable PUT sequence, but had not reached the standard of F |
图 9 SFR6氨基酸序列比对,分子进化树和进化距离 注: A: SFR6氨基酸序列比对; B: SFR6蛋白的分子进化树; C: SFR6在4个物种之间的进化距离 Figure 9 SFR6 amino acid sequences alignment, molecular evolutionary tree and the evolutionary distance Note: A: SFR6 amino acid sequences alignment; B: Molecular evolutionary tree of SFR6 proteins; C: The evolutionary distance of SFR6 protein between four species |
SFR6在CDD上没有搜寻到相关的保守功能结构域。但是从氨基酸需类比对图来看,SFR6的C端序列在拟南芥和豆科的2个物种中还是比较保守的(图9)。
SFR6在豆科和拟南芥中的进化历史可能为:原始基因SFR6A,在十字花科和豆科分离前发生一次复制,为SFR6A1和SFR6A2,在豆科的进化过程中,SFR6A1和SFR6A2在大豆属和苜蓿属中发生分化,SFR6A1在苜蓿属中丢失,保留了SFR6A2,即MtSFR6。而SFR6A2在大豆属中丢失。SFR6A1在大豆属与苜蓿属分离后在全基因组复制事件中发生复制,复制为GmSFR6a和GmSFR6c,而GmSFR6a可能由于某种机制,又发生一次基因复制,分为GmSFR6a和GmSFR6b,而GmSFR6b由于没有相关的生物钟的功能,没有面临自然选择的压力,所以发生多出缺失突变,而退化为假基因。而在拟南芥中,SFR6A2则发生丢失,保留了SFR6A1。
1.10 ARR3/ARR4在豆科中直系同源候选基因的鉴定
ARR4在豆科的3个物种中均没有找到直系同源候选基因。而ARR3在蒺藜苜蓿和百脉根中各有一个直系同源候选基因,相似性分别为48.8%和51.3%,并且基因全长都有PUT序列覆盖,表明它们极有可能是在体内表达并行使其在生物钟相应的功能 (表10, 图10)。
表 10 ARR3在豆科3个物种中的直系同源候选基因 注: N表示该基因找不到符合要求的配比PUT序列; E则表示该基因的某个部分或者几个部分有符合要求的PUT序列覆盖,但是还没有达到F的标准 Table 10 The orthologous candidate genes of ARR3 in three species of Legume Note: N: the gene could not find a line with the suitable PUT sequence; E: one part or several parts of the gene were covered by the suitable PUT sequence, but had not reached the standard of F |
图 10 ARR3氨基酸序列比对, 蛋白保守结构域分析,分子进化树和进化距离 注: A: ARR3蛋白保守结构域分析; B: ARR3氨基酸序列比对; C: ARR3蛋白的分子进化树; D: ARR3在4个物种之间的进化距离 Figure 10 ARR3 amino acid sequences alignment, conservation domain analysis of protein, molecular evolutionary tree and the evolutionary distance Note: A: Conservation domain analysis of ARR3 protein; B: ARR3 amino acid sequences alignment; C: Molecular evolutionary tree of ARR3 proteins; D: The evolutionary distance of ARR3 protein between four species |
1.11 TEJ在豆科中直系同源候选基因的鉴定
TEJ在蒺藜苜蓿中有1个直系同源候选基因,有部分表达的数据,而在百脉根中有2个直系同源候选基因,其中LjTEJ1为全长,LjTEJ2为部分全长,TEJ在大豆中有1个直系同源候选基因。这些直系同源候选基因与拟南芥中的TEJ基因的相似性在50%以上(表11; 图11)。
表 11 TEJ在豆科3个物种中的直系同源候选基因 注: E则表示该基因的某个部分或者几个部分有符合要求的PUT序列覆盖,但是还没有达到F的标准 Table 11 The orthologous candidate genes of TEJ in three species of Legume Note: E: one part or several parts of the gene were covered by the suitable PUT sequence, but had not reached the standard of F |
图 11 TEJ氨基酸序列比对, 蛋白保守结构域分析,分子进化树和进化距离 注: A: TEJ蛋白保守结构域分析; B: TEJ氨基酸序列比对; C: TEJ蛋白的分子进化树; D: TEJ在4个物种之间的进化距离 Figure 11 TEJ amino acid sequences alignment, conservation domain analysis of protein, molecular evolutionary tree and the evolutionary distance Note: A: Conservation domain analysis of TEJ protein; B: TEJ protein sequences alignment; C: Molecular evolutionary tree of TEJ proteins; D: The evolutionary distance of TEJ protein between four species |
2讨论
我们一共鉴定了34个大豆的生物钟相关候选基因,16个百脉根的生物钟相关候选基因,12个蒺藜苜蓿的生物钟相关候选基因。我们根据鉴定的直系同源基因的候选基因构建了11个分子进化树,并结合功能结构域和多序列联配,对这些基因的功能,进化趋势进行了分析。我们的分析表明,生物钟调控途径在拟南芥和豆科的这3个物种中发生了不同程度的分化,尤其是在一些关键性的成分比如生物钟调控的核心成分以及控制花期的关键性基因上。而以拟南芥为参照,百脉根和蒺藜苜蓿的生物钟调控网络相关基因的分化程度要大于大豆,这可能是由于百脉根和蒺藜苜蓿由于其基因组远小于大豆。
从进化的角度来看,不同物种间直系同源基因的功能都是从相同祖先基因分化而来,一般来说它们功能最为接近,我们通过同源基因所鉴定的候选基因是在目前没有实验数据支持的情况下能推知的与拟南芥相关基因功能最为类似的基因。与传统的分子生物的方法相比,生物信息学更为便捷,更具有目的性,而且可以在全基因的水平上进行分析,所得出的结论也更加全面,对我们的研究可对以后研究豆科的生物钟调控网络在实验阶段时的研究有一定的参考价值。
3材料与方法
3.1 基因组数据来源与获得
蒺藜苜蓿,百脉根和大豆的原始数据的获得和分析以在线为主。表12中列出了4个物种进行在线生物信息学分析的基因组数据库及相应网站。
表 12 4个物种进行生物信息学分析相应的数据库和网址 Table 12 The Name and Website of the Databank of four species |
3.2利用GO获取拟南芥生物钟调控网络相关基因信息
我们在Gene Ontology上搜寻biological clock,circadian clock,circadian rhythm等关键词,得到的Go术语(表13),再进入TAIR数据库,获得生物钟相关基因的序列及其他相关信息。
表 13 植物生物钟调控网络相关的GO术语和GO号 Table 13 GO item and number of plant biological clock |
3.3直系同源基因的鉴定
用拟南芥中的相关基因的氨基酸序列在豆科这3个物种各自的基因组数据库上进行Blastp或者tBlastn,选取得分在100以上,E值在-30以下的序列为候选序列,将比对中相似性最高的序列在NCBI中反向Blastp拟南芥的蛋白质库以确定是否是直系同源基因,如果不是,则表明该基因在该物种没有直系同源基因,如果是,则以该序列为标准,进一步确定其它直系同源基因。
3.4保守结构功能域的分析
综合利用NCBI上的CDD (Conserved Domain Search) (Marchler-Bauer et al., 2007; Marchler-Bauer et al., 2005)对目标序列进行保守结构域的分析。
3.5多序列联配,系统进化树构建和进化分析
利用ClustalX程序(Devlin and Kay, 2001),对4个物种中的直系同源候选基因进行氨基酸序列比对,采用默认参数。用MEGA4软件(Johnson, 2001; Mas, 2005)的NJ方法构建系统进化树的构建,自展开1000次,计算进化距离。同时,根据构建的进化树,我们可以大致推算某个基因在豆科和十字花科之间的进化历史。
3.6表达数据的获得和分析
PlantGDB的PUT序列是经过聚类去除冗余并部分拼接为全长cDNA的高质量的EST序列(唐唯其, 2008)。将直系同源候选基因的核苷酸序列BLASTN比对PUT序列,如果匹配上的PUT序列相似性大于95%,碱基长度长于200 bp,E值小于1E-30,我们则认为该PUT能够证明其匹配的直系同源候选基因部分是真实表达的。我们将EST证据分为3类:F,E,N。F代表整个候选基因有PUT序列覆盖,或者两端有PUT序列覆盖并且整个基因80%的序列被PUT序列覆盖。N表示该基因找不到符合要求的配比PUT序列,E则表示该基因的某个部分或者几个部分有符合要求的PUT序列覆盖,但是还没有达到F的标准。
3.7基因的重新注释
在鉴定豆科中3个物种同源候选基因时,由于基因组信息不全或者注释的错误,会出现没有全长ORF的基因,我们将该部分ORF的基因所对应的基因组区域序列在GenScan上重新注释,得到的全长ORF的基因,并将得到的ORF序列重新Blast比对物种相应的PlantGDB的基因组序列以确认其是否对应着原始的基因组区域,同时将ORF序列翻译的氨基酸序列反向Blastp拟南芥的蛋白质数据库以确认其是否是相关的直系同源基因。
作者贡献
李宗飞是本实验的执行人,负责实验设计、实施、数据分析及论文初稿写作;魏芳、刘振鹏参与数据分析、初稿的形成、修改;蔡梦蝶和张洁参与稿件修改、翻译及文献的校对;方宣钧博士是研究项目的构思者,指导论文的写作及修改。全体作者阅读并同意最终的文本。
致谢
本研究由诸暨市翠溪生物技术研究院《生命科学与生物技术创新基金,No 201601201》资助。宣佳为本论文稿件进行全文的英文评审,在此表示感谢。
Barak S., Tobin E.M., Andronis C., Sugano S., and Green R.M., 2000, All in good time: the Arabidopsis circadian clock, Trends in Plant Science, 5(12): 517-522
Covington M.F., Maloof J.N., Straume M., Kay S.A., and Harmer S.L., 2008, Global transcriptome analysis reveals circadian regulation of key pathways in plant growth and development, Genome Biology, 9(9): R130
Devlin P.F., and Kay S.A., 2001, Circadian photoperception, Annual Review of Physiology, 63(1): 677-694
Doyle M.R., Davis S.J., Bastow R.M., Mcwatter H.G., Kozma-Bognár L., Nagy F., Millar A.J., and Amasino R.M., 2002, The ELF4 gene controls circadian rhythms and flowering time in Arabidopsis thaliana, Nature, 419(6920): 74-77
Dunlap J.C., 2004, Kinases and circadian clocks: per goes it alone, Developmental Cell, 6(2): 160-161
Edery I., 2005, Role of posttranscriptional regulation in circadian clocks: Lessons from Drosophila, Chronobiology International, 16(4): 377-414
Fujiwara S., Wang L., Han L., Suh S., Salome P.A., McClung C.R., and Somers D.E., 2008, Post-translational regulation of the arabidopsis circadian clock through selective proteolysis and phosphorylation of pseudo-response regulator proteins, Journal of Biological Chemistry, 283(34): 23073-23083
Harmer S.L., Hogenesch J.B., Straume M., Chang H., Han B., Zhu T., Wang X., Kreps J.A., and Kay S.A., 2000, Orchestrated transcription of key pathways in Arabidopsis by the circadian clock, Science, 290(5499): 2110-2113
Harmon F., Imaizumi T., and Gray W.M., 2008, CUL1 regulates TOC1 protein stability in the Arabidopsis circadian clock, The Plant Journal, 55(4): 568-579
Johnson C.H., 2001, Endogenous time keepers in photosynthetic organisms, Annual Review of Physiology, 63(1): 695-728
Kevei E., Gyula P., Fehér B., Tóth R., Viczián A., Kircher S., Rea D., Dorjgotov D., Schäfer E., Millar A.J., Kozma-Bognár L., Nagy F., 2007, Arabidopsis thaliana circadian clock is regulated by the small GTPase LIP1, Current Biology, 17(17): 1456-1464
Kim J., Kim Y., Yeom M., Kim J.H., and Nam H.G., 2008, FIONA1 is essential for regulating period length in the Arabidopsis circadian clock, Plant Cell, 20(2):307-319
Kojima S., Takahashi Y., Kobayashi Y., Monna L., Sasaki T., Araki T., and Yano M., 2002, Hd3a, a rice ortholog of the Arabidopsis FT gene, promotes transition to flowering downstream of Hd1 under short-day conditions, Plant Cell Physiol, 43(10): 1096-1105
Li Z.F., Zhang J., Liu Z.P., and Fang X.J., 2015, Gene Regulation Network of Biological Clock in Plant, Fenzi Zhiwu Yuzhong (online) (Molecular Plant Breeding), 13(1): 1001-1008(李宗飞, 张洁, 刘振鹏, 方宣钧, 2015, 植物生物钟的基因调控网络, 分子植物育种(online), 13(1): 1001-1008)
Li Z.F., Zhuo W., Liu Z.P., and Fang X.J., 2015, Effects of Gene Regulation of Circadian Clock on Plant Growth and Development, Douke Jiyinzuxue Yu Yichuanxue (online) (Legume Genomics and Genetics), 6(1): 1-4 (李宗飞, 周味, 刘振鹏, 方宣钧, 2015, 生物钟的基因调控对植物生长发育的影响, 豆科基因组学与遗传学(online), 6(1): 1-4)
Marchler-Bauer A., Anderson J.B., Cherukuri P.F., DeWeese-Scott C., Geer L.Y., Gwadz M., He S.Q., Hurwitz D.I., Jackson J.D., Ke Z.X., Lanczycki C.J., Liebert C.A., Liu C.L., Lu F., Marchler G.H., Mullokandov M., Shoemaker B.A., Simonyan V., Song J.S., Thiessen P.A., Yamashita R.A., Yin J.J., Zhang D.C., and Bryant S.H., 2005, CDD: a conserved domain database for protein classification, Nucleic Acids Research, 33(S1): 192-196
Marchler-Bauer A., Anderson J.B., Derbyshire M.K., DeWeese-Scott C., Gonzales N.R., Gwadz M., Hao L., He S.Q., Hurwitz D.I., Jackson J.D., Ke Z.X., Krylov D., Lanczycki C.J., Liebert C.A., Liu C.L., Lu F., Lu S.N., Marchler G.H., Mullokandov M., Song J.S., Thanki N., Yamashita R.A., Yin J.J., Zhang D.C., and Bryant S.H., 2007, CDD: a conserved domain database for interactive domain family analysis, Nucleic Acids Research, 35(1): 237-240
Mas P., 2005, Circadian clock signaling in Arabidopsis thaliana: from gene expression to physiology and development, International Journal of Developmental Biology, 49(5-6): 491-500
Mcclung C.R., 2001, Circadian rhythms in plants, Annual Review of Plant Biology, 52(52): 139-162
Mizuno T., 2004, Plant response regulators implicated in signal transduction and circadian rhythm,Current Opinion in Plant Biology, 7(5): 499-505
Onai K., and Ishiura M., 2005, PHYTOCLOCK 1 encoding a novel GARP protein essential for the Arabidopsis circadian clock, Genes to Cells Devoted to Molecular & Cellular Mechanisms, 10(10): 963-972
Schning J.C., Streitner C., Page D.R., Hennig S., Uchida K., Wolf E., Furuya M., and Staiger D., 2007, Auto-regulation of the circadian slave oscillator component AtGRP7 and regulation of its targets is impaired by a single RNA recognition motif point mutation, Plant Journal, 52(6): 1119-1130
Strayer C.A., Oyama T., Schultz T.F., Raman R., Somers D.E., Más P., Panda S., Kreps J.A., and Kay S.A., 2000, Cloning of the Arabidopsis clock gene TOC1, an autoregulatory response regulator homolog, Science, 289(5480): 768-771
Streitner C., Danisman S., Wehrle F., Scning J.C., Alfano J.R., and Staiger D., 2008, The small glycine-rich RNA binding protein AtGRP7 promotes floral transition in Arabidopsis thaliana, Plant Journal for Cell & Molecular Biology, 56(2): 239-250
Tang W.Q., 2008, Bioinformatics analysis of plant light signal transduction pathway, Thesis for M.S., Fujian Agriculture and Forestry University, Supervisor: Wu W.R., pp. 28-31 (唐唯其, 2008, 植物光信号传导途径的生物信息学分析, 硕士学位论文, 福建农林大学, 导师: 吴为人, pp. 28-31)
Toh K.L., Jones C.R., He Y., Eide E.J., Hinz W.A., Virshup D.M., Ptacek L.J., and Fu Y.H., 2001, An hPER2 phosphorylation site mutation in familial advanced sleep phase syndrome, Science, 291(5506): 1040-1043