The user can select workflows that he or she wishes, using workflow platform (KNIME), on a client PC or a server. Within the workflow platform various components or parameters can be specified.

We use a free version of KNIME developed at University of Konstanz. KNIME is an eclipse-based workflow platform and uses nodes as processing units. Users can construct workflows, read data, calculate, analyze and visualize by combining those nodes. We also develop dynamic analysis platforms using semantic web technologies.

 Local PC

The user downloads the programs and executes on the user's PC. (e.g. Windows, Linux, MacOS) There are two types, a component type and a combination type.


-- Combination type
>>>Molecular Simulation Active Workflow
>>>RNA Structure Prediction Active Workflow
>>>Protein Structure Prediction Active Workflow
>>>PhylogeneticTree (DNA, RNA, Protein) Active Workflow
Installation manual for KNIME and combination type active workflow
Combination type active workflow user's manual


AISTViewer is a visualization node to display results of sequence analysis. At present, the node can display results of the following nodes:
CentroidFold_AIST: predicts RNA secondary structures.
Poodle_AIST: predicts disorder regions on a protein sequence.
Last_AIST: compares and aligns sequences for amino acid or nucleotide.
Mafft_AIST: multiple alignment program for amino acid or nucleotide.
Blast_AIST: finds regions of local similarity between sequences.
ClustalW_AIST: multiple alignment program for amino acid or nucleotide.
PsiPred_AIST: predicts secondary structure regions on a protein sequence.
DisoPred_AIST: predicts disorder regions on a protein sequence.
Memsat_AIST: predicts transmembrane regions on a protein sequence.

=Ports=
in-port: connects to an out-port of each following KNIME node.
CentroidFold_AIST
Poodle_AIST
Last_AIST
Mafft_AIST
Blast_AIST
ClustalW_AIST
PsiPred_AIST
DisoPred_AIST
Memsat_AIST

=Views=
CentroidFold_AIST: displays PNG files of RNA secondary prediction result.
Poodle_AIST: displays a line plot of disorder probability and a color-corded FASTA format sequence.
Last_AIST: displays a dot plot PNG file of aligned regions.
Mafft_AIST: displays a color-corded multiple alignment.
Blast_AIST: displays horizontal squares that represent regions of local similarity between sequences.
ClustalW_AIST: displays a color-corded multiple alignment.
PsiPred_AIST: displays horizontal squares and color-corded sequences that represent secondary structure regions.
DisoPred_AIST: displays horizontal squares and color-corded sequences that represent disorder regions.
Memsat_AIST: displays color-corded transmembrane regions.
AlignmentFileReader sets an absolute path of an alignment file to out port.

=Options=
Alignment File: set an absolute path of an alignment file.

=Ports=
out-port: an absolute path of an alignment file.
This node executes Ammos, which employs an automatic procedure for generation of 3D conformation of small molecules, based on distance geometry.
Please visit a AMMOS web site (http://www.mti.univ-paris-diderot.fr/fr/downloads.html) for further information.

=Options=
Select Output Directory: specify an absolute path of a directory for storing results of Ammos_AIST.

=Ports=
in-port: an absolute path of a ligand Mol2 file.
out-port: an absolute path of an Ammos result Mol2 file.
AutoDock_AIST is a node that executes AUTODOCK, which is popular protein-ligand docking software developed at Scripps Institute(http://autodock.scripps.edu), via SOAP. The user needs to provide two things. A target protein PDB file (a single chain protein NOT a protein complex) without bound ligands and a MOL2-formatted molecule file. Furthermore, user can execute AutoDock by specifying binding site coordinate (x, y, z). The program will automatically identify potential binding sites and calculate binding energy.

=Options=
Specify binding site coordinate (X, Y, Z): check "use" and specify the coordinates (double) if you use.
Select Output Directory: specify an absolute path of a directory for storing results of AutoDock.

=Ports=
in-port0(top): an absolute path of a PDB format file.
in-port1(bottom): an absolute path of a PDB format file.
out-port: an absolute path of output files.
BlastForModeller_AIST is a node that executes a Basic Local Alignment Search Tool (BLAST) via SOAP and this node is limited to a specific application (protein structure modelling). The BLAST is a homology search tool that finds regions of local similarity between sequences.
Please visit a NCBI Web site (http://blast.ncbi.nlm.nih.gov/Blast.cgi) for further information. And, this node uses a sample program and a XSL file offered by BioJava wiki site (http://www.biojava.org/wiki/) to convert BLAST XML format into HTML format.

=Options=
Execution Type:
BLAST - execute the BLAST search against a PDB ATOM sequence database.
PSI-BLAST - execute the PSI-BLAST search against a non-redundant amino acid sequence database and a PDB ATOM sequence database.
E-Value: specify the expected number of homologous sequences in the database.
Iteration: specify a iteration number of homology search in the database. This parameter is only used for the PSI-BLAST search.
Select Output Directory: specify an absolute path of a directory for storing results of protein structure modelling.

=Ports=
in-port: an absolute path of a FASTA format sequence File.
out-port: an absolute path of a (PSI-)BLAST result file.
Blast_NCBI executes NCBI BLAST via REST. User can execute BLASTN, BLASTP, BLASTX, TBLASTN, TBLASTX, PSI-BLAST, RPS-BLAST and MEGABLAST with E-value threshold against specified database. Advanced options are also available.
Please visit a NCBI BLAST web site (http://blast.ncbi.nlm.nih.gov/Blast.cgi) for further information.


=Options=

Selected Output Directory: specified an absolute path of a directory to store BLAST results.

Programs: the user can select below BLAST programs.
(Below descriptions are quoted from NCBI BLAST site.)
BLASTN - search a nucleotide database using a nucleotide query.
BLASTP - search protein database using a protein query.
BLASTX - search protein database using a translated nucleotide query.
TBLASTN - search translated nucleotide database using a protein query.
TBLASTX - search translated nucleotide database using a translated nucleotide query.
PSI-BLAST - find members of a protein family or build a custom position-specific score matrix.
RPS-BLAST - search for Conserved Domains within a protein or coding nucleotide sequence.
MEGABLAST - search a nucleotide database using nucleotide sequences with the greedy algorithm.

Databases: the user can select below BLAST databases.
(Below descriptions are quoted from NCBI BLAST site.)
Protein databases:
nr - non-redundant GenBank CDS translations + PDB + SwissProt + PIR + PRF, excluding those in env_nr.
refseq - protein sequences from NCBI Reference Sequence project.
swissprot - last major release of the SWISS-PROT protein sequence database (no incremental updates).
pat - proteins from the Patent division of GenBank.
month - all new or revised GenBank CDS translations + PDB + SwissProt + PIR + PRF released in the last 30 days.
pdb - sequences derived from the 3-dimensional structure records from the Protein Data Bank.
env_nr - non-redundant CDS translations from env_nt entries.
Smart v4.0 (only RPS-BLAST) - 663 PSSMs from Smart, no longer actively maintained.
Pfam v11.07255 (only RPS-BLAST) - PSSMs from Pfam, not the latest.
COG v1.00 (only RPS-BLAST) - 4873 PSSMs from NCBI COG set.
KOG v1.00 (only RPS-BLAST) - 4825 PSSMs from NCBI KOG set (eukaryotic COG equivalent).
CDD v2.05 (only RPS-BLAST) - 11399 PSSMs from NCBI curated cd set.

Nucleotide databases:
nr - all GenBank + EMBL + DDBJ + PDB sequences (but no EST, STS, GSS, or phase 0, 1 or 2 HTGS sequences).
No longer "non-redundant" due to computational cost.
refseq_mrna - mRNA sequences from NCBI Reference Sequence Project.
refseq_genomic - genomic sequences from NCBI Reference Sequence Project.
est - database of GenBank + EMBL + DDBJ sequences from EST division.
est_human - human subset of est.
est_mouse - mouse subset of est.
est_others - subset of est other than human or mouse.
gss - genome Survey Sequence, includes single-pass genomic data, exon-trapped sequences, and Alu PCR sequences.
htgs - unfinished High Throughput Genomic Sequences: phases 0, 1 and 2. Finished, phase 3 HTG sequences are in nr.
pat - nucleotides from the Patent division of GenBank.
pdb - sequences derived from the 3-dimensional structure records from Protein Data Bank. They are NOT the coding sequences for the corresponding proteins found in the same PDB record.
month - all new or revised GenBank+EMBL+DDBJ+PDB sequences released in the last 30 days.
alu_repeats - select Alu repeats from REPBASE, suitable for masking Alu repeats from query sequences. See "Alu alert" by Claverie and Makalowski, Nature 371: 752 (1994).
dbsts - database of Sequence Tag Site entries from the STS division of GenBank + EMBL + DDBJ.
chromosome - complete genomes and complete chromosomes from the NCBI Reference Sequence project. It overlaps with refseq_genomic.
wgs - assemblies of Whole Genome Shotgun sequences.
env_nt - sequences from environmental samples, such as uncultured bacterial samples isolated from soil or marine samples. The largest single source is Sagarsso Sea project. This does NOT overlap with nucleotide nr.
Please check an NCBI BLAST web site for furthermore databases information.

E-value threshold: specify E-value threshold. Default value is 1.0E-4.

Advanced: the user can specify a BLAST parameter (multiple parameters are not available).
(Below descriptions are quoted from NCBI BLAST site.)
-G - cost to open a gap.
-E - cost to extend a gap.
-r - reward for match.
-q - penalty for mismatch.
-e - expectation value (E).
-W - word size.
-y - dropoff (X) for blast extensions in bits (default if zero). (Integer) default = 20 for nuc-nuc 7 for other programs. Not applicable for megablast.
-X - x dropoff value for gapped alignment (in bits) (Integer). default = 30 for nuc-nuc (blastn and megablast), 15 for other programs.
-Z - final X dropoff value for gapped alignment (in bits). (Integer) 50 for nuc-nuc (blastn), 25 for other programs.
megablast - not applicable.
-P - 0 for multiple hits 1-pass, 1 for single hit 1-pass (Integer). Does not apply to blastn or megablast.
-A - multiple Hits window size (zero for single hit algorithm)(Integer).
-I - number of database sequences to save hits.
-b - number of database sequences to show alignments.
-v - number of database sequences to show one-line descriptions.
-Y - effective length of the search space.
-z - effective length of the database (use zero for the real size)(Real), default=0.
-c - pseudocount constant for PSI-BLAST (Integer), default=7.
-F - filtering directives.


=Ports=
in-port: an absolute path of a FASTA sequence(s) file.
out-port: an absolute path of a BLAST result file.
CPHmodels is a protein homology modeling server. The template recognition is based on profile-profile alignment guided by secondary structure and exposure predictions. This prediction server have been developed by Technical University of Denmark (DTU). This node executes CPHmodels on DTU server.
Please visit a CPHmodels web site (http://www.cbs.dtu.dk/services/CPHmodels/) for further information.

=Options=
Output: specify an output directory for execution result file.

=Ports=
in-port: an absolute path of a protein sequence file.
out-port: an absolute path of a CPHmodels result directory.
CentroidFold_AIST is a SOAP version of a CentroidFold KNIME node. CentroidFold predicts an RNA secondary structure from an RNA sequence and is one of the most accurate tools.
Please visit the CentroidFold Web site (http://medals.jp/elist/detail/17.html) for further information.

=Options=
Input type: select an input type format from FASTA or ClustalW.
Output: specify an output directory for execution result file.
Weight of base pairs: select a gamma value of weight of base pairs.
Advanced: set advanced other options (optional)

=Ports=
in-port: an absolute path of an input file.
out-port: absolute paths of output files.
ChloroP predicts the presence of chloroplast transit peptides (cTP) in protein sequences and the location of potential cTP cleavage sites. This prediction server have been developed by Technical University of Denmark (DTU). This node executes ChloroP on DTU server.
Please visit a ChloroP web site (http://www.cbs.dtu.dk/services/ChloroP/) for further information.

=Options=
Output: specify an output directory for execution result file.

=Ports=
in-port: an absolute path of a protein sequence file.
out-port: an absolute path of a ChloroP result file.
ClustalW_AIST executes ClustalW, which is a multiple sequence alignment program for DNA or proteins, via SOAP.
Please visit a Clustal Web site (http://www.clustal.org/) for further information.

=Options=
type: select a "PROTEIN" or a "DNA".
Select Output Directory: specify an absolute path of a directory for storing results of ClustalW.

=Ports=
in-port: an absolute path of a Multi-FASTA format sequence File. (Due to the restricted specification of ClustalW, a sequential number is added to each header line of sequences contained in the Multi-FASTA file and any spaces contained in the header lines are replaced by underscore "_".)
out-port: an absolute path of a ClustalW result file.
CompoundQuery_AIST can search compounds from Namiki database.
Namiki: You can send a query to Namiki, which is based on databases developed by Namiki shoji co., ltd. (http://www.namiki-s.co.jp/).
*The maximum number of hits is 500.

=Options=
Database: Namiki
Search Words: click a check box if you use this option. If you input search words multiply, enclose each word in double quotation marks, and separate ones with a space.
Molecular Weight: click a check box if you use this option. Specify the range of molecular weight.
logP: click a check box if you use this option. Specify the range of logP.
TPSA: click a check box if you use this option. Specify the range of TPSA.
smiles: click a check box if you use this option. Input a smiles string.
inchi: click a check box if you use this option. Input a inchi string.
inchikey: click a check box if you use this option. Input a inchikey string.
Number of rotatable bonds: click a check box if you use this option. Specify the range of number of rotatable bonds.
Charge: click a check box if you use this option. Specify the range of charge.
H-bond Acceptor: click a check box if you use this option. Specify the range of H-bond acceptor.
H-bond Donor: click a check box if you use this option. Specify the range of H-bond donor.
Number of rings: click a check box if you use this option. Specify the range of number of rings.
Search Condition: choose an either "and" or "or".
Output directory: an absolute path of output directory.

=Ports=
out-port: an absolute path of a MOL2-format file.
CompoundSelector shows the query results from the database server. You can select molecules in the table. The program then writes 3D coordinates (calculated in advance) to a file in MOL2 format.

=Ports=
in-port: an absolute path of a MOL2-format file.
out-port: an absolute path of a newly generated (MOL2-format) file.
DictyOGlyc predicts GlcNAc O-glycosylation sites in Dictyostelium discoideum proteins using neural network, and have been developed by Technical University of Denmark (DTU). This node executes DictyOGlyc on DTU server.
Please visit a DictyOGlyc web site (http://www.cbs.dtu.dk/services/DictyOGlyc/) for further information.

=Options=
Output: specify an output directory for execution result file.

=Ports=
in-port: an absolute path of a protein sequence file.
out-port: absolute paths of DictyOGlyc result files.
This node executes Disopred, which predicts disordered regions on protein, via SOAP.
Reference
Ward JJ, Sodhi JS, McGuffin LJ, Buxton BF and Jones DT (2004)
Prediction and functional analysis of native disorder in proteins from the three kingdoms of life, Journal of Molecular Biology, 337, 635-645.
Please visit a Disopred web site (http://bioinf.cs.ucl.ac.uk/index.php?id=806) for further information.

=Options=
Select Output Directory: specify an absolute path of a directory for storing results of Disopred_AIST.

=Ports=
in-port: an absolute path of a protein sequence file.
out-port: absolute paths of Disopred result files.
DockingTemplateSelector launches a viewer for selecting a model template file. User can select only one model file on the viewer, and an absolute path of the model file is set to out-port.

=Ports=
in-port: an absolute path of a directory stored model templates.
out-port: an absolute path of a user selected file.
FastaFileReader sets an absolute path of a FASTA file to out port.

=Options=
Fasta File: set an absolute path of a FASTA file.

=Ports=
out-port: an absolute path of a FASTA file.
Fastapl_AIST executes fastapl, which is a tool for processing FASTA data for amino acid or nucleotide, via SOAP.
Please visit a fastapl Web site (http://medals.jp/elist/detail/193.html) for further information.

=Options=
Select Output Directory: specify an absolute path of a directory for storing results of fastapl.
Input examples are as follows:
Reformat sequence lines to have max line length 100.
fastapl -p -l 100
Truncate sequences to a maximum sequence length of 39.
fastapl -p -e '$seq = substr( $seq, 0, 39 )'
Reverse complement DNA sequences.
fastapl -p -e '$seq = reverse $seq; $seq =~ tr/acgtACGT/tgcaTGCA/'
Reverse complement DNA sequences, including ambiguous codes.
fastapl -p -e '$seq = reverse $seq; $seq =~ tr/ACGTNSWRYKMBDHVacgtnswrykmbdhv/TGCANSWYRMKVHDBtgcanswyrmkvhdb/'
Ambiguous code handling suggested by Martin C. Frith.
=Ports=
in-port: an absolute path of a (Multi-)FASTA format sequence File.
out-port: an absolute path of a result file.
This node executes fpocket2, which is protein pocket (cavity) detection algorithm based on Voronoi tessellation, via SOAP.
Please visit a fpocket2 web site (http://fpocket.sourceforge.net/) for further information.

=Options=
Select Output Directory: specify an absolute path of a directory for storing results of fpocket2_AIST.

=Ports=
in-port: an absolute path of a PDB file.
out-port: an absolute path of a directory for storing results of fpocket2_AIST.
HitRegionSelector_AIST selects optimal regions for modelling from (PSI-)BLAST results. The regions are selected under the following conditions:
a) Thresholds of coverage, identity and minimum sequence length that user specifies in "Configure" display.
b) longest sequence region where several hit regions are overlapped.

=Options=
Coverage(%): specify a threshold of coverage value defined as percentage of homologous region for hit sequence. The range is from 50 to 100.
Identity(%): specify a threshold of amino acid identity between aligned regions. The range is from 10 to 100.
Minimum Length: specify a threshold of minimum length of hit sequence that is passed to MODELLER. Lower limit is 26.

=Ports=
in-port: an absolute path of a (PSI-)BLAST result file.
out-port: an absolute path of a hit region selector file.
HtmlView displays a (result) file as HTML.

=Ports=
in-port: An absolute path of the (result) file.

=Views=
display the (result) file as HTML.
InitMinMM_AIST performs energy minimization and MM and returns the results of MM via SOAP.
Input files should be located under ligand number as follows:
1/PL.crd
1/PL.pdb
1/PL.top
1/ligand.prep
2/PL.crd
2/PL.pdb
2/PL.top
2/ligand.prep
.
.
.
1: ligand number
PL.crd: protein-ligand complex coordinate file (amber format)
PL.pdb: protein-ligand complex PDB file (for a reference, optional)
PL.top: protein-ligand complex topology file (amber format)
ligand.prep: ligand prep file (amber format)

=Options=
Select Output Directory: specify an absolute path of a directory for storing results of InitMinMM_AIST.

=Ports=
in-port: an absolute path of a tar file of MmPrep result.
out-port: an absolute path of a tar file of MM result.
IPknot_AIST executes IPknot which predicts RNA pseudoknot based on maximizing expected accuracy.
Please visit a IPknot web site (http://medals.jp/elist/detail/154.html) for further information.

=Options=
Select Output Directory: specify an absolute path of a directory for storing results of IPknot.
-IPknot options-
-t th: threshold of base-pairing probabilities for each level
-g gamma: weight for true base-pairs equivalent to -t 1/(gamma+1)(default: -g 2 -g 4)
-e model: probabilistic model (default: McCaskill)
-r n: the number of the iterative refinement (default: 0)
-i: allow isolated base-pairs
-b: output the prediction by BPSEQ format
-P param: read the energy parameter file for the Vienna RNA package

=Ports=
in-port: an absolute path of a FASTA format file (RNA).
out-port: an absolute path of an output file.
JmolForModeller executes Jmol, which is a application of molecule viewer. This node needs to connect to Modeller_AIST node.
Please visit a Jmol web site (http://jmol.sourceforge.net) for further information.

=Ports=
in-port: an absolute path of a directory storing results.

=Views=
A pop up dialog is displayed as follows:
Modeller_AIST: display model numbers and the objective function values.
MergeTargetAndLigand, InitMinMM_AIST: display model numbers and the energy scores.
RASSIE_AIST, Rascal_AIST: display model numbers.
The user can select only one radio button. After selecting radio button, the user can launch Jmol on pressing "Execute Jmol" button.
Last_AIST is a SOAP version for Last KNIME node. LAST is software for comparing and aligning sequences, typically DNA or protein sequences. LAST is similar to BLAST, but it copes better with huge amounts of sequence data. It can also report probabilities for every pair of aligned letters, indicating the reliability of each pairing.
Please visit the Last Web site (http://last.cbrc.jp) for further information.

=Options=
Input type: select a sequence type from Protein or DNA.
Target sequence file for comparison: specify a target sequence file for comparison.
Output: specify an output directory for execution result file.
PramAL: specify parameters to execute lastal command.
PramDB: specify parameters to execute lastdb command.
Advanced: set advanced other options (optional).

=Ports=
in-port: an absolute path of an input file.
out-port: absolute paths of output files.
LSDBCrossSearch reads a (Multi-)FASTA file and displays header lines of all sequences contained in the file. Then, by specifying some search words with search identifiers and submitting in a view picture plane of this node, a web browser is opened and an LSDB (Life Science DataBase) cross search is executed on the LSDB web page.
Please visit the LSDB Cross Search Web site (http://lifesciencedb.jp/dbsearch/)(Japanese version only) for further information.

=Ports=
in-port: an absolute path of the (Multi-)FASTA file.

=Views=
FASTA Header Lists: displays header lines of all sequences contained in the (Multi-)FASTA file.
LSDB Cross Search: displays a text box where can input some search words with search identifiers. By specifying search words and clicking a "LSDB cross search" button, a web browser is opened and displays search results.
Search identifiers:
AND: ' '(space) e.g. 'network socket'
OR : '|'(pipe) e.g. 'network | socket'
XOR: '!'(exclamation) e.g. 'network ! socket'
Wild Card: '*'(asterisk) e.g. 'inter*', `sphere`
Priority order: '|' > ' '(space), '!'
LipoP predicts lipoproteins and discriminates between lipoprotein signal peptides, other signal peptides and n-terminal membrane helices in Gram-negative bacteria. This prediction server have been developed by Technical University of Denmark (DTU). This node executes LipoP on DTU server.
Please visit a LipoP web site (http://www.cbs.dtu.dk/services/LipoP/) for further information.

=Options=
Output: specify an output directory for execution result file.

=Ports=
in-port: an absolute path of a protein sequence file.
out-port: an absolute path of a LipoP result file.
Mafft_AIST executes MAFFT, which is multiple alignment program for amino acid or nucleotide, via SOAP.
Please visit a MAFFT Web site (http://mafft.cbrc.jp/alignment/software/) for further information.

=Options=
Select Output Directory: specify an absolute path of a directory for storing results of MAFFT.
-MAFFT options-
--op # : gap opening penalty, default: 1.53
--ep # : offset (works like gap extension penalty), default: 0.0
--maxiterate # : maximum number of iterative refinement, default: 0
--clustalout : output: clustal format, default: fasta
--reorder : outorder: aligned, default: input order
--quiet : do not report progress

=Ports=
in-port: an absolute path of a Multi-FASTA format sequence File (Due to the restricted specification of MAFFT, a sequential number is added to each header line of sequences contained in the Multi-FASTA file and any spaces contained in the header lines are replaced by underscore "_".)
out-port: an absolute path of a result file.
This node executes MEMSAT, which predicts the secondary structure and topology of all-helix integral membrane proteins based on the recognition of topological models, via SOAP.
Reference
Jones, D.T., Taylor, W.R. and Thornton, J. M. (1994)
Biochemistry. 33:3038-3049.
Please visit a Memsat web site (http://bioinf.cs.ucl.ac.uk/?id=756) for further information.

=Options=
Select Output Directory: specify an absolute path of a directory for storing results of Memsat_AIST.

=Ports=
in-port: an absolute path of a protein sequence file.
out-port: absolute paths of Memsat result files.
MergeTargetAndLigand is a node that a target data and a ligand data (PDB-format) are merged into a single file.

=Ports=
in-port: an absolute paths of a result directory (storing ligand files) and of a target data file.
out-port: an absolute paths of a result directory (storing merged files).
MMPrep_AIST generates files needed to perform MD, MM and energy minimization, namely, coordinate and topology files, via SOAP.

=Options=
Select Output Directory: specify an absolute path of a directory for storing results of MMPrep_AIST.

=Ports=
in-port: an absolute path of a directory stored files of a protein (PDB ATOM lines) and a ligand (PDB HETATM lines) bounded.
out-port: an absolute path of a tar file of MMPrep result.
Modeller_AIST executes MODELLER, which is used for homology or comparative modelling of protein 3D structures. This node needs to connect to a RegionSelectorForModeller node.
Please visit a MODELLER Web site (http://salilab.org/modeller/) for further information.

=Options=
Number of Models for Modelling: specify the number of structure models predicted by MODELLER. Execution time depends on this number (takes longer as the user increases the number of models).
License Key for MODELLER (required): specify the license key for MODELLER. A license key is needed to use MODELLER. Please obtain a MODELLER license key by accessing to "http://saliab.org/modeller/registration.html".

=Ports=
in-port: an absolute path of directory containing date and random numbers.
out-port: an absolute path of directory containing date and random numbers.
Mol2FileReader sets an absolute path of a MOL2 file to out port.

=Options=
MOL2 File: set an absolute path of a MOL2 file.

=Ports=
out-port: an absolute path of a MOL2 file.
MoltrecMD_AIST performs Moltrec MD and returns the results of MD via SOAP.
Input files should be located under ligand number as follows:
1/PL.crd
1/PL.pdb
1/PL.top
1/ligand.prep
2/PL.crd
2/PL.pdb
2/PL.top
2/ligand.prep
.
.
.
1: ligand number
PL.crd: protein-ligand complex coordinate file (energy-minimized structure in amber format)
PL.pdb: protein-ligand complex PDB file (for a reference, optional)
PL.top: protein-ligand complex topology file (energy-minimized structure in amber format)
ligand.prep: ligand prep file (amber format)

=Options=
Select Output Directory: specify an absolute path of a directory for storing results of MoltrecMD_AIST.

=Ports=
in-port: an absolute path of a tar file of MM result.
out-port: an absolute path of an MD result file.
This node executes MOPAC (Molecular Orbital PACkage), which is a semiempirical quantum chemistry program based on Dewar and Thiel's NDDO approximation.
Please visit a MOPAC web site (http://openmopac.net/) for further information.

=Options=
Select Output Directory: specify an absolute path of a directory for storing results of Mopac_AIST.

=Ports=
in-port: an absolute path of a ligand Mol2 file.
out-port: absolute paths of Mopac result out, PDB format and Mol2 format files.
NetCTL predicts CTL epitopes in protein sequences. This prediction server have been developed by Technical University of Denmark (DTU). This node executes NetCTL on DTU server.
Please visit a NetCTL web site (http://www.cbs.dtu.dk/services/NetCTL/) for further information.

=Options=
Output: specify an output directory for execution result file.
supertype: specify a supertype among below list:
"A1 supertype"
"A2 supertype"
"A3 supertype"
"A24 supertype"
"A26 supertype"
"B7 supertype"
"B8 supertype"
"B27 supertype"
"B39 supertype"
"B44 supertype"
"B58 supertype"
"B62 supertype"

=Ports=
in-port: an absolute path of a protein sequence file.
out-port: an absolute path of a NetCTL result file.
NetChop predicts cleavage sites of the human proteasome using Neural Network. This prediction server have been developed by Technical University of Denmark (DTU). This node executes NetChop on DTU server.
Please visit a NetChop web site (http://www.cbs.dtu.dk/services/NetChop/) for further information.

=Options=
Output: specify an output directory for execution result file.
Method: specify either "C term 3.0" or "20S 3.0".

=Ports=
in-port: an absolute path of a protein sequence file.
out-port: an absolute path of a NetChop result file.
NetNES predicts leucine-rich nuclear export signals (NES) in eukaryotic proteins using a combination of neural networks and hidden Markov models, and have been developed by Technical University of Denmark (DTU). This node executes NetNES on DTU server.
Please visit a NetNES web site (http://www.cbs.dtu.dk/services/NetNES/) for further information.

=Options=
Output: specify an output directory for execution result file.

=Ports=
in-port: an absolute path of a protein sequence file.
out-port: absolute paths of NetNES result files.
NetPhosK predicts kinase specific eukaryotic protein phosphoylation sites using neural network. This prediction server have been developed by Technical University of Denmark (DTU). This node executes NetPhosK on DTU server.
Please visit a NetPhosK web site (http://www.cbs.dtu.dk/services/NetPhosK/) for further information.

=Options=
Output: specify an output directory for execution result file.

=Ports=
in-port: an absolute path of a protein sequence file.
out-port: an absolute path of a NetPhosK result file.
NetPhos predicts serine, threonine and tyrosine phosphorylation sites in eukaryotic proteins using neural network. This prediction server have been developed by Technical University of Denmark (DTU). This node executes NetPhos on DTU server.
Please visit a NetPhos web site (http://www.cbs.dtu.dk/services/NetPhos/) for further information.

=Options=
Output: specify an output directory for execution result file.

=Ports=
in-port: an absolute path of a protein sequence file.
out-port: absolute paths of NetPhos result files.
NetPicoRNA predicts cleavage sites of picornaviral proteases using neural network. This prediction server have been developed by Technical University of Denmark (DTU). This node executes NetPicoRNA on DTU server.
Please visit a NetPicoRNA web site (http://www.cbs.dtu.dk/services/NetPicoRNA/) for further information.

=Options=
Output: specify an output directory for execution result file.

=Ports=
in-port: an absolute path of a protein sequence file.
out-port: an absolute path of a NetPicoRNA result file.
PDBjMineWeb displays PDBj Mine web, which is a new web interface to PDBj that supersedes xPSSS.
Please visit a PDBj Mine web site (http://www.pdbj.org/doc/help.cgi?Search) for further information.

=Ports=
in-port: "PDB code (4 characters) + chain identifier"."start residue number of a hit region"-"end residue number of the hit region".

=Views=
A list of PDB code and hit region range are shown. The user can select only one radio button. After selecting radio button, the user can launch PDBj Mine web search on pressing "Open PDBj Mine Web" button.
PdbFileReader sets an absolute path of a PDB file to out port.

=Options=
PDB File: set an absolute path of a PDB file.

=Ports=
out-port: an absolute path of a PDB file.
PhylogeneticTree creates a phylogenetic tree by using ClustalW, which constructs a multiple sequence alignment and a phylogenetic tree. This node is required for preparing a ClustalW-format multiple alignment file that contains over four sequences.
Please visit a ClustalW Web site (http://www.clustal.org/) for further information.

=Options=
Select Output Directory: specify an absolute path of a directory for storing results of PhylogeneticTree.
Methods: specify either NJ (Neighbor-joining method) or UPGMA (Unweighted Pair Group Method with Arithmetic Mean).
BOOTSTRAP: specify either On or Off. BOOSTRAP is valid only for NJ.
Number of BOOTSTRAP: specify a number of BOOTSTRAP. The number is valid only when BOOSTRAP is "On".

=Ports=
in-port: an absolute path of the multiple alignment file.
out-port: an absolute path of a result file.
A PhylogeneticTreeView node executes Archaeopteryx, which is a Java application based on forester libraries, to display an annotated phylogenetic tree.
Please visit Archaeopteryx web site (http://www.phylosoft.org/archaeopteryx/) for further information.

=Ports=
in-port: an absolute path of a PhylogeneticTree result file.
PocketSelector node launches a viewer to select a pocket site. User can select only one pocket site on the viewer.

=Ports=
in-port: an absolute path of a directory stored Qsite results.
out-port: an absolute path of a directory stored Qsite results.
Poodle_AIST is a SOAP version of POODLE KNIME node. POODLE (Prediction Of Order and Disorder by machine LEarning) is a system that predicts disorder regions using amino acid sequence alone. The node implements two prediction methods, short disorder regions prediction (POODLE-S) and long disorder regions prediction (POODLE-L).
Please visit a POODLE Web site (http://medals.jp/elist/detail/62.html) for further information.

=Options=
Type: select a disorder prediction method from POODLE-S or POODLE-L.
Output: specify an output directory for execution result file.

=Ports=
in-port: an absolute path of the input file.
out-port: an absolute path of the output file.
Psipred_AIST executes PSIPRED via SOAP. PSIPRED is a secondary structure (helix, sheet and coil) prediction method for a protein sequence.
Please visit PSIPRED web site (http://bioinf.cs.ucl.ac.uk/psipred/) for further information.

=Options=
Select Output Directory: specify an absolute path of a directory for storing results of PsiPred_AIST.

=Ports=
in-port: an absolute path of a FASTA-format sequence file.
out-port: an absolute path of a result file.
Raccess_AIST executes Raccess which computes the accessibility of all the segments of a fixed length for a given RNA sequence when the maximal distance between base pairs is limited to a fixed size W.
Please visit an Raccess web site (http://medals.jp/elist/detail/155.html) for further information.

=Options=
Select Output Directory: specify an absolute path of a directory for storing results of Raccess.
-Raccess options-
-access_len=(integer) : contiguous length over which the transcript is accesible [default: 50].
-bind_range=(first):(last) : for each segment of accessibility computation, the binding energy between the region [first, last] (in 1-based, inclusive-end, coordinates relative to the segment) with a complementary DNA/RNA fragment is calculated [default: none].
-bind_dna=(bool) : if true, binding energy is computed for a complementary DNA fragment if false, binding energy is computed for a complementary RNA fragment. [default: true].
-max_pair_dist=(integer) : maximal span of base pairs considered [default: 100].
-energy_thr=(double) : only output the results below the specified energy threshold (unit: kcal/mol) [default: 100].

=Ports=
in-port: an absolute path of a single FASTA format file (RNA).
out-port: an absolute path of an output file.
RactIP_AIST executes RactIP which predicts RNA-RNA interaction using integer programming.
Please visit a RactIP web site (http://medals.jp/elist/detail/153.html) for further information.

=Options=
Select Output Directory: specify an absolute path of a directory for storing results of RactIP.
-RactIP options-
-p: do not use the constraints for interenal pseudoknots.
-a alpha: weight for hybridation probabilities (default: 0.5).
-t th_bp: threshold of base-pairing probabilities (default: 0.5).
-u th_hy: threshold of hybridazation probabilities (default: 0.2).
-m: use McCaskill model (default: CONTRAfold model).
-i: allow isolated base-pairs.

=Ports=
in-port0(top): an absolute path of a single FASTA format file (RNA).
in-port1(bottom): an absolute path of a single FASTA format file (RNA).
out-port: an absolute path of an output file.
Rascal_AIST executes Rascal, which is a prediction tool for tertiary structure of RNA based on the fragment assembly algorithm following given secondary structure, via SOAP. Rascal can predict several RNA-RNA interacting structures such as kissing-loops.

=Options=
Select Output Directory: specify an absolute path of a directory for storing results of Rascal.

=Ports=
in-port: an absolute path of a RactIP result file.
out-port: an absolute path of a Rascal result directory.
RASSIE_AIST executes RASSIE(RNA Assembler using Secondary Structure Information Effectively), which is a tool for predicting RNA tertiary structures using known secondary structure information, via SOAP.

=Options=
Select Output Directory: specify an output directory for execution result file.
-RASSIE options-
-q Nstruct
-ins insertion_num
-clst -outclst n
-ins_chain

=Ports=
in-port: an absolute path of a result file of RNA secondary structure prediction.
out-port: absolute paths of RASSIE result files.
ResultPathSetter is used to display prediction results of KNIME nodes executed in the past. User can set an absolute path of directory stored prediction results, a KNIME node name of the results, and sequence name (for Poodle_AIST and PsiPred_AIST) in configuration pane. Some information is set on an out-port taking the same form of the KNIME node user specified. By connecting to the out-port and an in-port of an appropriate viewer KNIME node, user can access the results on the viewer KNIME node.
Available KNIME nodes are as follows:
"Ammos_AIST" --- HtmlView
"AutoDock_AIST" --- JmolForModeller, HtmlView
"BlastForModeller_AIST" --- HtmlView
"Blast_NCBI" --- AISTViewer, HtmlView
"CentroidFold_AIST" --- AISTViewer, HtmlView
"ChloroP_DTU" --- HtmlView
"ClustalW_AIST" --- AISTViewer, HtmlView
"CompoundQuery_AIST - Namiki" --- HtmlView, CompoundSelector
"CPHModels_DTU" --- JmolForModeller, HtmlView, DockingTemplateSelector
"DictyOGlyc_DTU" --- HtmlView
"Disopred_AIST" --- AISTViewer, HtmlView
"Fastapl_AIST" --- HtmlView
"fpocket2_AIST" --- JmolForModeller
"HitRegionSelector_AIST" --- HtmlView
"InitMinMM_AIST" --- JmolForModeller
"IPknot_AIST" --- HtmlView
"Last_AIST" --- AISTViewer, HtmlView
"Lipo_DTU" --- HtmlView
"Mafft_AIST" --- AISTViewer, HtmlView
"Memsat_AIST" --- AISTViewer, HtmlView
"Modeller_AIST" --- JmolForModeller
"MoltrecMD_AIST" --- HtmlView
"Mopac_AIST" --- HtmlView
"NetCTL_DTU" --- HtmlView
"NetChop_DTU" --- HtmlView
"NetNES_DTU" --- HtmlView
"NetPhos_DTU" --- HtmlView
"NetPhosK_DTU" --- HtmlView
"NetPicoRNA_DTU" --- HtmlView
"PhylogeneticTree_AIST" --- PhylogeneticTreeView, HtmlView
"Poodle_AIST" --- AISTViewer, HtmlView
"PsiPred_AIST" --- AISTViewer, HtmlView
"RASSIE_AIST" --- JmolForModeller
"Raccess_AIST" --- HtmlView
"RactIP_AIST" --- HtmlView
"Rascal_AIST" --- JmolForModeller
"SecretomeP_DTU" --- HtmlView
"SignalP_DTU" --- HtmlView
"Sparql_AIST" --- HtmlView, SequenceSelector
"Sparql_AIST_Adv" --- HtmlView
"TargetP_DTU" --- HtmlView
"TmHmm_DTU" --- HtmlView
"WolfPsort_AIST" --- HtmlView

=Options=
result directory path: specify an absolute path of directory stored prediction results. The user can check the path by using "Interactive Table" node in "Data Views -> Utility" of Node Repository. Please connect the "Interactive Table" node to above KNIME nodes.
KNIME node: select a KNIME node name corresponding to the specified directory path.
Sequence name (for Poodle_AIST and PsiPred_AIST): specify a sequence name (word "query" is set if you do not specify).

=Ports=
out-port: set some information taking the same form of KNIME node user specified.
RNA2DChecker_AIST checks whether RNA 2D structure is suitable for executing RASSIE, via SOAP.

=Ports=
in-port: an absolute path of a result file of RNA secondary structure prediction.
out-port: an absolute path of a result file of RNA secondary structure prediction.
SecretomeP predicts non-classical i.e. not signal peptide triggered protein secretion. This prediction server have been developed by Technical University of Denmark (DTU). This node executes SecretomeP on DTU server.
Please visit a SecretomeP web site (http://www.cbs.dtu.dk/services/SecretomeP/) for further information.

=Options=
Output: specify an output directory for execution result file.
Organism Type: specify an organism type among below list:
Gram-negative bacteria
Gram-positive bacteria
Mammalian

=Ports=
in-port: an absolute path of a protein sequence file.
out-port: an absolute path of a SecretomeP result file.
This node selects a sequence from SPARQL results.

=Ports=
in-port: an absolute path of a SPARQL result file.
out-port: an absolute path of selected sequence FASTA-format file.
A SetVariable node sets an active flow variable output port by specifying integer 0, 1, or 2 in this node's Configure dialog.

=Dialog Options=
Active flow variable output port number:
0: first flow variable output port
1: second flow variable output port
2: third flow variable output port

SignalP predicts the presence and location of signal peptide cleavage sites in amino acid sequences from different organisms: Gram-positive prokaryotes, Gram-negative prokaryotes, and eukaryotes. The method incorporates a prediction of cleavage sites and a signal peptide/non-signal peptide prediction based on a combination of several artificial neural networks. This prediction server have been developed by Technical University of Denmark (DTU). This node executes SignalP on DTU server.
Please visit a SignalP web site (http://www.cbs.dtu.dk/services/SignalP/) for further information.

=Options=
Output: specify an output directory for execution result file.
Organism Type: specify an organism type among below list:
Eukaryotes
Gram-negative bacteria
Gram-positive bacteria

=Ports=
in-port: an absolute path of a protein sequence file.
out-port: absolute paths of SignalP result files.
SiteAndPoseSelector launches a viewer to select a docking result directory by specifying a docking site. The user can select only one result directory on the viewer, and a file contained the selected result directory path is set to out-port with an absolute path.

=Ports=
in-port: an absolute path of a directory stored docking results.
out-port: an absolute path of a file contained a result directory selected on the viewer.
This node executes SPARQL search against each SPARQL endpoint (fRNAdb, SEVENS, UNIPROT (reviewed human), PDB (100% identity non-redundant) and KEGG - pathway) using keywords, species names (not available for UNIPROT), minimum and maximum sequence length thresholds, and Resolution (for PDB). The user can output the SPARQL results as a FASTA-format (for "SequenceSelector" node) or Tab-delimited output file.
The user can also input SPARQL sentence in "Input SPARQL Query" text area. If the user input the SPARQL sentence, other options are entirely-ignored except for "Output directory" option.

=Options=
Output directory: specify an absolute path of directory to store SPARQL results.
Sparql endpoints: specify SPARQL endpoints.
Species name: specify (a) species name(s) as search parameters.
Keyword: specify (a) keyword(s) as search parameters (not available for UNIPROT).
Minimum sequence length: specify a minimum sequence length thresholds as search parameters.
Maximum sequence length: specify a maximum sequence length thresholds as search parameters.
Resolution: specify a Resolution (for PDB) as search parameters.
Pathway: specify a pathway (for KEGG-pathway) as a search parameter.
Output format: specify either FASTA or Tab-delimited.
Advanced: input SPARQL sentence.

=Ports=
out-port: an absolute path of a FASTA-format or Tab-delimited output file.
This node executes SPARQL using user specified SPARQL query and endpoint.

=Options=
Output directory: specify an absolute path of directory to store SPARQL results.
Endpoint: specify an endpoint.
Advanced: input SPARQL sentence.

=Ports=
out-port: Specify an absolute path of a Tab-delimited output file.
TargetP predicts the subcellular location of eukaryotic proteins. The location assignment is based on the predicted presence of any of the N-terminal presequences: chloroplast transit peptide (cTP), mitochondrial targeting peptide (mTP) or secretory pathway signal peptide (SP). This prediction server have been developed by Technical University of Denmark (DTU). This node executes TargetP on DTU server.
Please visit a TargetP web site (http://www.cbs.dtu.dk/services/TargetP/) for further information.

=Options=
Output: specify an output directory for execution result file.
Organism Type: specify an organism type among below list:
Non-Plant
Plant

=Ports=
in-port: an absolute path of a protein sequence file.
out-port: an absolute path of a TargetP result file.
TemplateSelector_AIST divides hit regions that are highly similar to existing PDB structures. If hit regions are determined as highly similar protein for PDB structures, MODELLER is not executed. Evaluation criteria for dividing hit regions are coverage and identity thresholds.

=Options=
Coverage threshold: specify the coverage threshold to determine for modelling or for displaying PDBj Mine.
Coverage(%) = (hit region length / PDB sequence length) * 100
Identity threshold: specify the identity threshold to determine for modelling or for displaying PDBj Mine.
Identity(%) = (number of identical amino acids / aligned region length) * 100

=Ports=
in-port: an absolute path of a hit region selector file.
out-port0(top): There are hit regions for modelling: an absolute path of directory containing date and random numbers. There are not any hit regions: "no hits".
out-port1(bottom): There are highly similar regions to existing PDB structures: "PDB code (4 characters) + chain identifier"."start residue number of a hit region"-"end residue number of the hit region". There are not any hit regions: "No PDB code information found to display PDB structures via PDBj Mine".
TMHMM predicts transmembrane helices in proteins. This prediction server have been developed by Technical University of Denmark (DTU). This node executes TMHMM on DTU server.
Please visit a TMHMM web site (http://www.cbs.dtu.dk/services/TMHMM/) for further information.

=Options=
Output: specify an output directory for execution result file.

=Ports=
in-port: an absolute path of a protein sequence file.
out-port: absolute paths of TMHMM result files.
WolfPsort_AIST executes WoLF-PSORT that predicts the subcellular localization sites of proteins based on their amino acid sequences via SOAP. The method, which is a major extension to the venerable PSORTII program, makes predictions based on both known sorting signal motifs and some correlative sequence features such as amino acid content. Like PSORT and PSORTII, WoLF PSORT displays some information about detected sorting signals which is useful in helping users determine the reliability of the prediction in specific cases.
Please visit a WoLF-PSORT Web site (http://medals.jp/elist/detail/63.html) for further information.

=Options=
Kingdom: select a kingdom(animal, plant or fungi) corresponding to an input sequence.
Output: specify an absolute path of a directory for storing a result file.

=Ports=
in-port: an absolute path of a FASTA format sequence File.
out-port: an absolute path of a result file.
SOAP EXECUTION ERROR (GENERAL)
The KNIME node execution is stopped and an error messege is displayed in a pop-up window.

=Messege=
SOAP execution error. Please check your input file. If you have any questions, please let us know (workflow@medals.jp).

=Nodes=
All nodes executed via SOAP.
SOAP EXECUTION ERROR (BUSY)
This error occurs when the SOAP server is busy. The KNIME node execution is stopped and an error messege is displayed in a pop-up window.

=Messege=
Sorry, system is busy. Please try later.

=Nodes=
All nodes executed via SOAP.
SOAP EXECUTION ERROR (TIME OUT)
This error occurs when the execution time is over allowed time. The time is different by each node and set up at least over three hours. The KNIME node execution is stopped and an error messege is displayed in a pop-up window.

=Messege=
Time out error has occured. "program name" program failed in calculating in time.

=Nodes=
All nodes executed asynchronously via SOAP.
SOAP EXECUTION ERROR (FILE SIZE)
This error occurs when the total size of the user input file is over 32MB. The KNIME node execution is stopped and an error messege is displayed in a pop-up window.

=Messege=
The total file size is "total size" bytes. Maximum total size is 32MB.

=Nodes=
All nodes executed via SOAP.
SETTING ERROR (MULTI-FASTA)
This error occurs when a multi-FASTA file is entered into the nodes that only permit a single FASTA format. The KNIME node execution is stopped and an error messege is displayed in a pop-up window.

=Messege=
A Multi-FASTA file is not permitted. Please input a single FASTA file.

=Nodes=
All nodes that only permit a single FASTA format as input.
SETTING ERROR (SEQUENCE LENGTH)
This error occurs when a length of an input sequence is over allowed size. The KNIME node execution is stopped and an error messege is displayed in a pop-up window.

=Messege=
Sequence length limit is "sequence length"aa. Please input more short sequence.

=Nodes=
All nodes that have an allowed length of sequence.
SETTING ERROR (FILE NOT FOUND)
This error occurs when your query file is not found. The KNIME node execution is stopped and an error messege is displayed in a pop-up window.

=Messege=
Your setting file does not exist.

=Nodes=
AlignmentFileReader, FastaFileReader, Mol2FileReader and PdbFileReader nodes.
EXECUTION ERROR (NO HIT)
This error occurs when search results are not found. The KNIME node execution is stopped and an error messege is displayed in a pop-up window.

=Messege=
No results found. Please change your query conditions.

=Nodes=
CompoundQuery_AIST node.
SPARQL ERROR (NO HIT)
This error occurs when SPARQL search results are not found. The KNIME node execution is stopped and an error messege is displayed in a pop-up window.

=Messege=
SPARQL RESULTS: 0 data hits. No hit. Please change your search conditions.


=Nodes=
All Sparql node.
SPARQL ERROR (INVALID ENDPOINT)
This error occurs when SPARQL endpoints are invalid. The KNIME node execution is stopped and an error messege is displayed in a pop-up window.

=Messege=
Plase check SPARQL endpoints.


=Nodes=
All Sparql node.

 SOAP/REST services

The user executes programs on servers at AIST and receives the results using SOAP interface. (Client programs using SOAP interface are required)