Index of /files/genomes/Arachis_ipaensis/annotation/maker

      Name                                        Last modified      Size  Description
[PARENTDIR] Parent Directory - Araip.K30076.a1.M1.tar.gz 2015-01-26 15:09 61M Maker gene models, including GFFs, CDS, peptides, annotations README.txt 2015-01-26 12:39 13K [DIR] older/ 2015-01-25 14:15 - Previous assembly versions. See release notes in README below. usage_agreement.txt 2014-10-26 14:22 1.5K Usage agreement for the Peanut Genomics Consortium gene models
README file for MAKER-P annotations on assembly version 1 of 
A. duranensis and A. ipaensis.

Steven Cannon, Andrew Farmer, Longhui Ren, Sudhansu Dash, Ethy Cannon

Initial draft July, 2014; see subsequent REVISION NOTES below at **.

Please also see the "usage notes" file in this directory.

The gene models were called using MAKER-P, by Andrew Farmer, for the Peanut Gneomics Consortium,
with the following sequence inputs: transcriptome assemblies of A. ipaensis,  A. duranensis,
and A.hypogaea (from David Bertioli, Mark Burow, Peggy Ozias-Akins, Brian
Scheffler, and Scott Jackson), protein sequences for Glycine max (Wm82.a2.v1),
Medicago truncatula (v4,0), and Phaseolus vulgaris (v1.0) and transposable elements
from David Bertioli et al. Genes were assigned identifiers and filtered by Steven Cannon
and Longhui Ren for quality (protein and transcript AED scores <0.75) and for repeat content
(genes with >=50% transposon/TE-element homology were removed, and additional repeat
classes identified by Longhui Ren).

Functional descriptions of gene models are available in the *.AHRD.* files.
These annotations were gathered using the Automated Assignment of Human Readable
Descriptions (AHRD) tool, by Andrew Farmer, using the following search targets:
Arabidopsis v10, Medicago v4.0, soybean v. Glyma.Wm82.a2.v1, and
InterProScan 5.3-46.0 (targeting UniProt90 (2014) and Gene Ontology (2014)). 
There are two variants of the *.AHRD.* files: the full AHRD results in
*.AHRD.csv, and an abbreviated, two-column format in *.AHRD.slim .

REVISION NOTES:
----------
**Revision NOTE added Jan 25, 2015
There were too many  small gene models and peptides - e.g. 67 peptides < 20 aa for Aradu 
and 91 peptides < 20 aa for Araip. Homology support is limited to nonexistent.  
Remove (move to ".lowqual_or_TE") predicted genes that are 1) less than 60 aa / 180 nt 
and 2) are annotated only as "Unknown protein" in the AHRD annotation. 
For Aradu and Araip respectively, this results in 568 and 693 gene models being moved 
to the .lowqual_or_TE files.
 
----------
** Revision NOTE Oct 28, 2014, for the initial public (pre-publication) release of
the gene models at PeanutBase on Nov 4. 

The genes (peptides, nucleotides, and GFF features) were renamed from the original
MAKER IDs to names with the form
  Aradu.Q1VD6.1
  Aradu.52P3J.1
  Aradu.L1DQB.1

  Araip.B0ISR.1
  Araip.YXM0P.1
  Araip.0Z4JV.1

The annotation versions for these assemblies are indicated with the following 
strings, giving genus-species (Aradu, Araip), genotype (V14167, K30076), 
assembly-version (a1), and method-annotation-version (M1 for MAKER annotation 1):
  Aradu.V14167.a1.M1
  Araip.K30076.a1.M1

The original MAKER predictions were also filtered for quality and mobile-element
contamination. Genes with MAKER protein and expression AED scores >= 0.75 were 
removed (approximately 5% of the original models). 
Genes were also removed that had mobile-element homology over >= 50% of the transcript
sequence at >= 80% identity and E-value <= 1e-10 (blastn), with comparison to 
mobile elements identified by Bertioli, Gao, Jackson et al. in May, 2014
(mobile-elements-AA051914.fasta and mobile-elements-BB051914.fasta for A. duranensis
and A. ipaensis, respectively).

In total, after filtering for quality and TE contamination, the following numbers of 
gene predictions were retained:
39313/41542 = 94.63%
44436/47338 = 93.86%

----------
**Revision NOTE added Sept 17, 2014 (prior to public release)
We have added a CDS file, Araip.K30076.a1.M1.cds.fa. This is in addition to the transcripts 
file, which contains UTR sequence when present.
The GFF files have also received a minor update: multiple CDS features that belong to a single
mRNA uses the same ID for all such CDSs, per the GFF3 spec (http://www.sequenceontology.org/gff3.shtml).
Earlier, each CDS had a unique numeric suffix.

----------
**Revision NOTE added Aug 24, 2014, prior to public release:

Changes in A. ipaensis:
The scaffold Aipa130 in Araip.B_unplaced was redundant with Aipa130_1 in Araip.B09
and Aipa130_2 in Araip.B05, and has been removed from the August 24 version of Araip.B_unplaced
in both Araip_v1.0_by_scaff/ and Araip_v1.0_by_scaff_RM/ of the genome assembly files.
This affects 354 genes called on the redundant Aipa130 scaffold. Those gene models have been
removed from all annotation files in this directory (Araip.K30076.a1.M1).

Additional scaffolds were removed if they contained no predicted genes and
either contained fewer than 2000 non-N bases or contained more than 50% Ns.

These gene models (formerlly on "unplaced" scaffold Aipa130) were removed from the files in 
the Araip.K30076.a1.M1 directory (prefix each name below with "Araip."):
001HH  02Z4V  06ZCW  0EN0U  0F3X9  0H040  0J0WE  0NG5K  0P5A3  0S673  0U7AR  0V6B5  0V6Q3  10834 
11V6U  11WWK  191S0  1A3G5  1JS0Q  1TI9F  1X61U  20DHM  246V8  27DFQ  2H2XI  2XX51  2Y76G  33B51 
36RPY  382L0  3B5QF  3E5BI  3EU6Z  3G932  3TJ1R  3TL1S  3UU6I  3YF7K  3Z918  48N72  49QM0  4B0W3 
4CX91  4GZ90  4L1IS  4N3CM  4P5SD  4S1SQ  4X2BC  51ZT3  52X8I  5994B  5BK8E  5C60L  5F10W  5GM0Y 
5H3I4  5KW81  5N54K  5Q4I7  60YEK  61QYC  63M4Y  64BF1  65GTF  68S57  6IS1S  6JD1A  6NV39  6Z4AG 
72SZL  74GX4  762C5  7A0QJ  7C4JH  7HA2S  7MA01  7MM2E  7MW9M  7SH75  82C7E  82Z59  834F3  85V66 
89VWU  8DE77  8HI9H  8J5CX  8R0UV  8SJ3N  8XL90  90GP7  92419  988T9  99B8S  9A7X3  9MA66  9N2Q0 
9P8TA  9T779  9UJ99  9V5NN  A07QG  A1BI1  A49DS  A4LQJ  A4ZSE  A93VH  AA9ZR  ACD03  AG58Q  APR08 
APU2T  AR8Y7  AS9HJ  AT3YL  AWQ49  B10FN  B1L2M  B8VGC  BBI52  BJ9CA  BM2G1  BR7F6  BXW4F  BZ9V8 
C1J08  C5Y1R  C6DHI  C6L0M  C6TFG  CDC60  CEZ65  CG5DQ  CH5U0  CKZ5C  CNE4H  CQ5A0  CU71Q  D2NBN 
D52AE  D6RLZ  DIK04  DL29J  DS5YJ  DSF24  DU40F  DWP66  E0AWR  E0G28  E1U0J  E3VQW  E5CPA  EA4I0 
EF9ZJ  EI8GP  ENU8C  EU30W  F1U1E  FDZ71  FEY1V  FG5CY  FLC89  FYE6B  G6CGJ  G7NV2  G921Y  GDU7F 
GJ3Q7  GKU2F  GW8Q6  H18TL  H1BKI  H202T  H3A0U  H65BE  H7MKZ  H7VYG  HE1NN  HFM73  HFU9N  HN235 
HP7WG  I04NA  I1418  I1V6R  IA39G  IHN7R  IWI6A  IX6R4  J12XP  J19HJ  J24J2  J37VJ  J4HAW  J69EC 
J8930  J9L0G  JIP2K  JQ667  JRB8P  JX2KH  JY3HH  JZ9KR  K06DM  K52UZ  K84UG  K8CT6  K8IN7  K9BQ0 
KJ86A  KJA9S  KJB2J  KKG4H  KL0MV  KM2QI  KMW57  KNM04  KP64P  KP7XC  KR5SF  KT1XB  KVW1Z  L1P2T 
L56VN  L584C  L5ECX  L6L3T  L7E5J  LG7UB  LK3PY  LQL4E  M0MN2  M0Y1P  MD7VQ  MI5RZ  MNW79  MS6GR 
MS8NL  N1ICA  N2MLL  N4RYK  N5NQC  N602U  N71C2  NCD5X  NPA6A  NT3D2  P2QHR  P5FQM  P6YKS  P8GNE 
P93B6  PIK9D  PN9NZ  PQ5K8  PT08B  Q0F61  Q3ZGY  Q7KV7  Q81CP  QC7YU  QHM9J  QN97S  QV9QQ  QX0BP 
QY045  R071U  R10PX  R1CC3  R1PY4  R1Z8R  R5DNV  R685C  R7AGB  R8J56  RER57  RG195  RG4U0  RIJ35 
RM3WY  RM46K  RM7Y7  RMX8H  RR0G3  RX9HR  S1WWX  S32ST  S4BB2  S4I0I  SB6WT  SL6PU  ST18R  SU71A 
SX5BJ  T2DLR  T3SVY  T5GMQ  T721N  TI4U6  TI5K6  TP95J  TPR49  TTB4U  U5IFC  U898F  UID9F  UY1F0 
UYG5P  UZ2GB  V37XU  V5URA  V6279  V7VZV  VJM4D  VL4N0  W1CD4  W1DTH  W2X0L  W4DVX  W9P7Y  WG93D 
WN7HL  WVY8P  X0JL3  X2VQK  X3IJW  X50ZE  X5F92  X61IM  X85BP  XJK2Q  XP0M3  XSG1G  XU1PH  XW6Y2 
XY6WZ  XZ181  Y7N32  Y8D1E  YB015  YB3KF  YEY1T  YS5I5  YUT1V  YYU3B  Z1Z8F  Z2VDL  Z9495  ZF2P3 
ZG60Q  ZS87I  ZTA8B  ZWS5Q 

Changes in A. duranensis:
The following 11 scaffolds were included in both the pseudomolecule
assemblies and in the set of unplaced scaffolds, Aradu.A_unplaced (prior to August 24, 2014).
    Adur78_2
    Adur188_2
    Adur356_2
    Adur104_4
    Adur18_2
    Adur227_2
    Adur147_2
    Adur1_2
    Adur160_2
    Adur66_3
    Adur101_2
Additionally, there were 903 superfluous genes called on those "extra" unplaced scaffolds.
In the 2014-08-24 version of the A.duranensis assemblies and annotations, those scaffolds
have been removed from the file Aradu.A_unplaced from the directories
Aradu_v1.0_by_scaff and Aradu_v1.0_by_scaff_RM, and genes called on those scaffolds have
been removed from the fasta, GFF, and AHRD annotation files, from
directory Aradu.V14167.a1.M1

These gene models were removed from the files in the Aradu.V14167.a1.M1 directory 
(prefix each name below with "Aradu."):
00WI3 010AF 01PSP 03J3G 03K3M 04LKD 04TW1 053H1 058P7 08BSD 0C0MZ 0G36K 0P4IL 0PH45 0PZ4B 0QP09
0TG8Q 0U7H3 0U7U1 0UT21 0W5JE 0WU97 0X98C 11JVT 11LC5 14WNH 15VB5 17XR5 18UI5 18XNJ 1C5II 1CA35
1CK9V 1D1BV 1I7ZM 1IH1I 1K1VQ 1K23X 1M5FX 1QQ4H 1V6Z2 1W0J2 1W8IU 207ZG 22JUA 22L89 23QEM 243G3
24ACJ 26MUT 26QA2 26ZAW 28I5Y 28IVU 290E2 29VF5 29ZN0 2AR2F 2AR54 2AX50 2B3C9 2D0VN 2D5LW 2E2GR
2GP2V 2H4TA 2LU0W 2MR7V 2N7SZ 2P542 2UM0D 2V6BH 2WQ5F 2X0L9 2X2Y5 2YQ54 307DU 30K2F 313GI 31IZ5
32V53 32WW0 33B1M 33GCR 34T26 35LMG 35RPL 362JN 36QKJ 37083 37LEC 3DB7N 3F84I 3F8HG 3J569 3N1K8
3UZ6D 3V91X 3W0KF 3YV4V 404ED 416CR 41ZKK 42CUK 4318X 44724 44L9P 45NSB 45Q84 47AM2 49DC7 4C39C
4C83C 4CD9W 4E706 4FT5E 4GY43 4J5Y5 4JR0X 4LU1C 4M4K7 4P14Z 4R3QH 4R9F6 4VE38 4WQ1W 4ZZ0R 504XE
50YP0 51162 51HYJ 51W1W 51ZC4 5269V 52EKJ 52SW9 561GZ 56H49 57ZKN 59LWX 5A8XW 5AV2X 5C4ZJ 5G9IH
5HW4R 5P8WX 5RG18 5SU8G 5X6KB 5Z0CS 5Z12X 60C6H 60CFK 60RJ3 61INU 627SR 62A6R 62TRQ 63BLD 63H62
648Z3 65A0S 66SJR 66WPZ 66XNB 67C0L 68QX7 69NSX 6E4YY 6EM92 6IY6Z 6QK9J 6V6W3 6Y97Z 6ZR1U 70ALM
710U6 717GE 71FF9 733TY 74YVR 75BQC 75HNX 76A7W 76E9R 76NRD 770BB 79EIY 7B9IM 7BJ65 7C9W3 7D4FH
7FV8N 7J4AJ 7KC8T 7KV1T 7Q2VU 7Q8Z6 7RE3K 7S0WZ 7S1M8 7S6LW 7SI7W 7V2EZ 7W2FA 7WJ00 80EUK 81P3T
824ZX 82F9Z 8513S 85UC6 86HY5 87ZI4 88NIZ 892SB 893KH 89L2W 8CC8N 8D7SJ 8G1GA 8J88V 8JI5A 8L0KQ
8L9MM 8S2PK 8T3DH 8TJ96 8UM0S 8V9Q0 8WV9V 8XT0T 8Y6N4 8ZY1L 90REG 90W4K 91YE3 922SX 92T50 932NB
93GDY 94265 94IAX 94RVU 94TFN 95AKT 97732 98CXU 9AA02 9AJ9H 9AM22 9BI7Y 9C2L5 9CP3L 9D16Q 9G7G7
9K4I3 9LX3W 9MY2F 9NC04 9S29G 9U6L7 9WU7B 9X4C4 A0E57 A1JC9 A1VPA A2K69 A2M60 A494Z A4Y0L A5DEY
A66VW A8IJ3 A8T3C A94Z9 AAF4A AB0PX ACI3B AG56E AG60J AHV1U AJ5Q6 AL4CU AMC51 APF3P AQT88 ASN2U
AU0DL AVA62 AZJ79 B0K9F B0MJD B3GVM B70AZ B7570 B92UD B94HU BB7FT BC2SV BE9GX BH6CN BKZ4I BNC1L
BTM12 BU2CD BUF42 BUW8Z BYY3P C09P0 C0A7J C0GEI C0JB1 C0PDL C10BU C12FH C2GI9 C2QU3 C2R6U C2RQG
C4UVM C55TX C8TPC C990X C9TBG CBM8P CC394 CHX1I CK6DV CNI8Y CNL78 CPV2P CQ0YF CQ756 CS4P1 CTJ4Y
CTV3M CU8F7 CVW1T CW7RI CZX0F D08T1 D0CI9 D0KDL D0YRW D1DQK D2IYV D2P40 D5CB8 D6DEA DCF72 DD2A0
DE3A1 DEZ7N DFY5I DG6Z6 DI59R DL21B DL4YB DNL1Q DNS9R DSN7Z DSV3C DT1LP DU8AN DU9FI DVS2N DZG03
DZL7S E0HWF E0YFU E2UGM E31I6 E38T9 E48XE E49Q1 E5FTU E5Q5M E5U92 E79DN E8IX5 E9I1E ECN66 EGU7F
EIQ8P EL4LP ELZ3H ENN9V EQ8NC EW1FX EWQ7K EY0YY F01T2 F0ALU F251V F2H9N F316A F37TY F4KR9 F60LN
F6MC6 F7199 F7J8I F87V8 F8K0B F9UXT FA4NI FA5BN FG5Q3 FG9T6 FH9NF FI3HW FJ2FV FL8AK FQ2U5 FQ2XY
FRG9Y FS3I6 FUQ2Z FV64G FZ0DS G057B G122T G1FRZ G2GRH G2RH4 G3KS3 G3MMJ G4FN9 G4QF2 G5PZC G6WFJ
G70Y8 G8IUQ G9T5Y GA240 GD3QE GDJ3R GE8P2 GG0Y9 GH1VN GKK8G GLR8U GMM47 GN0TI GNH9P GP74Y GPE06
GS2GP GT0CF GX13N H10J9 H16TI H1I4I H1YKZ H24V3 H2Y3B H32IE H6FFC H7W2R H7WZQ H9G4B HC5F3 HE5EE
HHV1Q HJ9IE HL2X5 HR0TA HW2U1 HX0YV HZ92E I1CM7 I3P3A I4K5K I56DI I5DIS I5TTL I5WEB I6EPR I73TU
I89B8 I930E I96CM IF63Z IFG3C IG97T IJ4QD IJ6AP ILB4D IPK0C IS156 ISL2N IT9WR IUB03 IUH05 IVM2H
IZ9D6 J0AXS J15PL J2RY4 J2W5H J2ZI7 J3K06 J4Z68 J5LI7 J5VF4 J6QI2 J8GYL J8NFU J8QGL J8RG6 J93PN
JBA0H JBU55 JC5I4 JD2AB JD8C7 JG786 JI57H JIA2E JIT0Y JK5F1 JK5FG JK7GM JKF9B JPM9K JTN2J JVK79
JVM3M JZF0M K06CR K0G9I K1J6J K35WP K653K K8RXH K9082 K98WP K9P7L K9TT7 KE23Q KE9M8 KI4IE KL77T
KM1U5 KQ1RX KQ37H KQT7T KUF6A KUH9N KV4PJ KV9EZ KX1E1 KY82Y L1AC3 L2J0Z L4XX8 L70DS L7S17 L8V6K
L8WYG L9QN9 LC8QY LCY4J LD7RX LE3SD LEW8C LF603 LHM2V LHX57 LI3GF LN78U LNN17 LNY6C LQ3VB LRZ9Y
LS9KE LTM0S LTR2J LUD2Y LUW2B LV8SW LW356 LX3ZP LX8MV M00C4 M3E5L M4I60 M4U3S M5L0P M5XFH M6I7N
M6MWV M6QPH M7YKX M8A8M M8JCE M9MNZ MC9NU MH9Y9 MHT8T MKW38 ML9X7 MMS44 MT259 MUW6J MV9GC MWS57
MY4FE N015W N0URY N1AI0 N1CLX N20ZD N2CK4 N3M3Q N3MFS N56IX N6LJA N6Y79 N71XV N7UKE N86MA N8K4T
N8NCM N8VV6 N9NZK NA8X5 NBL5T NE6YH NE70Y NE9E2 NF700 NGX5V NJP6R NK9MS NQJ4T NQU91 NTB1E NU4BV
NUI22 NUJ3Q NV3J7 NW35K NWW9X P06Z7 P2ZRF P386W P6KTH P6YMH P8BFP PA2MZ PBU8I PC2AX PD89I PDF2R
PFW0B PHC7R PLY8F PMA2E PQL32 PSE34 PSQ1L PTE9M PU0DM PU5YN PY98M Q04JW Q0NEZ Q16M9 Q2NN2 Q3U90
Q4NVC Q7QBT Q92YN Q9CZP QB9GT QC2QK QC5X8 QD18X QEK9M QHG5G QID9H QK2VM QL7IN QP3M9 QP8P2 QS3SN
QVV1J QX4TE QXR7D R0350 R0KAD R184U R3KI1 R4MK6 R65KK R7KN5 R8Z76 R9FBF R9NSU RC2BP RC6Q7 RES7Q
RF06B RF0JR RGN9S RLW3J RMX6D RQ3J1 RV3FN RX7V0 S07A8 S0P69 S1922 S1MRX S1NGF S1RUZ S4CNB S4RMB
S5LUE S6D4I S6SP0 S8LAR S9CBE SBR2U SD1DB SH0V8 SH8XW SJC13 SK7QE SKA72 SMK3P SP5GZ SU5GT SX5EK
SXW7K SZ3CX T1DXF T2WF8 T4CIA T58KK T5M2E T6L8V T8653 T8HBT T94F3 T9AL6 T9N0L TAP4T TE2VX TF9WG
TH4TV TJY1A TNB5G TP5R1 TX552 TX5BS TX77R U045W U2GIU U2QJ9 U3G32 U4GTZ U529I U58FJ U5IBN U60W2
U82SD U8V9F U91CQ U9WGE UA7KU UF91B UG4MQ UHL5D UI467 UID1H UJ8QF UK72N UM228 UP5NR UP8M6 UQY3Q
UR5GG US24P UU2R5 V14D6 V26HY V2CVT V445F V4NMM V4Z9D V5CQ8 V5CYT V5JYT V61KM V6XWW V8RQK V8ZX4
VAJ1X VB73V VE692 VES0R VG434 VPQ6I VQ5HL VTN1U VWC8Z VY0N6 VY5K3 VZ4D9 W0EV7 W2QG6 W33IA W3U7X
W3ZTA W421R W4BNN W5VR5 W8VBN WFJ21 WI4IW WJ004 WK5BD WN7RS WP0M9 WRP32 WTM51 WUX19 WV3HN WVM6X
WWD2H WX095 WX96A WZ0M1 X022Z X1QBE X1RWA X22BC X2BY3 X2SJF X3TLC X51X4 X70JH X7UWR X8EIG X9G4M
X9TRJ XA6P1 XAA5G XFQ5P XG0RC XHB4X XIY1D XLU6T XN79B XNV8S XST3B XWI9Y Y08G4 Y13AY Y1CEY Y2WED
Y4IK7 Y5GLF Y5Q38 Y6E2C Y6X3G Y7J8H Y8KDZ Y9866 Y99YE Y9BEP Y9JLR YA24U YA37T YB2YM YD458 YK4XN
YM61D YNS0R YP6JQ YQ80J YV9YU YYM4N Z1XSE Z2YHK Z5XJD Z6AKE Z6ZDX Z72ZX Z7CED Z9475 Z95K4 Z9F6Q
ZCH1M ZIJ0V ZJ1LH ZND2U ZS624 ZWN3N ZZX5A

Additional scaffolds were removed if they contained no predicted genes and
either contained fewer than 2000 non-N bases or contained more than 50% Ns.