WWW-QUERY & CROSS-TAXA HELP
Use of the buttons
In the following order, you should:
- Select the type of search (for WWW-Query only)Use the button "Search for sequences" to retrieve a set of sequences, or "Search for families" to retrieve a set of families. If you select the "family" button, only the databases containing gene families will be available in the list.
- Select the type of sequences Use the button "Protein" to select databases which contains proteins, or the button "Nucleotide" to select databases which contains nucleotide sequences.
- Choose the database.
Databases
This field allows to select one of the proposed databases with a radio button. Use the button 'protein databank' or 'nucleotide databank' to select databases containing protein sequences (Swiss-Prot, etc.) or nucleotide sequences (EMBL, GenBank, etc.). Some databases ( HoGenom, etc.) present both a protein and a nucleotide database. The databases and their current contents are described here.
WWW-Query Selection Criteria
Many selection criteria are available. Mainly they correspond to the structured elements of the sequence documentation in the data banks. Thes selection criteria, Keyword, Keywords list, Name, Name list, Accession number, Acc. number list, Species, Family ID, Family ID list, Type, Year, Organelle, Molecule, Reference, Author, Journal, Status can be combined with logical operators.
Keyword
back to criteria
Enter the keyword name, using * to specify any series of characters to catch
several keywords in one shot. Use of space is allowed. Examples: RNA POLYMERASE,
*POLYMERASE, *TRANSFER RNA*SYNTHETASE*. Keywords are partially tree-structured.
Any match catches also all keywords placed below in the tree.
Keywords list
back to criteria
Give the name of a keywords list you have transferred to the server, this
using the send a list utility.
Name
back to criteria
Enter a sequence name, possibly using * to match any string of characters.
Use of * is VERY slow when placed at the beginning of the query,
otherwise the reply is fast. Examples: ECTRPA, ECTRP*.
Names list
back to criteria
Enter the name of a list containing sequence names. For that purpose you
may either use a list previously created with WWW-Query or a list you have sent
with the send a list utility.
Accession number
back to criteria
Enter an accession number. Example: L04470. All accession numbers
listed in sequence annotations are indexed.
Accession number list
back to criteria
Enter the name of an list containing sequence accession numbers you have
transferred to the server, this with send a list
utility.
Species
back to criteria
Enter the species name, using * to specify any series of characters to catch
several keywords in one shot. Use of space is allowed. Examples: ESCHERICHIA COLI,
*COLI, E*COLI. Species names are tree-structured according to the biological
classification of species.
Family ID
back to criteria
Enter a family name, possibly using * to match any string of characters.
Use of * is VERY slow when placed at the beginning of the query,
otherwise the reply is fast. Examples: HBG000020, HBG00002*.
Family ID list
back to criteria
Enter the name of a list containing family names. For that purpose you
may either use a list previously created with WWW-Query or a list you have sent
with the send a list utility.
Type
back to criteria
Sequence type identifies the nature of the encoded molecule (e.g., protein,
tRNA, rRNA). Type should not be confused with molecule which denotes the chemical
nature of the sequenced molecule (e.g., DNA, mRNA, tRNA). Type is defined
only for the nucleotide sequence banks (GenBank, EMBL, Hovergen, NRSub and CGDB).
Presently the existing types are:
ID Locus entry (EMBL, SWISS-PROT, NRSub) LOCUS Locus entry (GenBank, Hovergen, EMGLib) CDS .PE protein coding region (all) RRNA .RR mature ribosomal RNA (all) TRNA .TR mature transfer RNA (all) MISC_RNA .RN other structural RNA coding region (EMBL, GenBank, Hovergen, NRSub, EMGLib) SNRNA .SN small nuclear RNA (EMBL, GenBank, Hovergen, EMGLib) SCRNA .SC small cytoplasmic RNA (EMBL, GenBank, Hovergen, NRSub, EMGLib) 3'INT .3I 3' intron (Hovergen) 3'NCR .3F 3' non-coding region (Hovergen) 5'INT .5I 5' intron (Hovergen) 5'NCR .5F 5' non-coding region (Hovergen) CPG .CG region > 200 bp with CpGobs/CpGexp > 0.5 (Hovergen) INT_INT .IN internal intron (Hovergen)
Each entry of a FEATURE TABLE describing a coding region of a DNA
fragment gives rise to a subsequence equal to the fragments described in the
location of the feature. The type of the resulting subsequence equals the key of
the corresponding feature table entry. The name of the resulting subsequence is
built by adding to the parent sequence's name an extension uniquely identifying
this particular feature.
Sequences of a given type are generally subsequences, i.e., fragments of
parent sequences, except if the coding region covers totally the parent sequence,
in which case ACNUC does not create a subsequence.
Year
back to criteria
Type the desired year of publication in the box, with one of these operations:
> (after), = (this year), or < (before).
Organelle
back to criteria
Organelle (e.g., chloroplast, mitochondrion) denotes the nature of the genome
that harbors a particular gene. By extension, WWW-Query also sees `nuclear' as
an organelle. Also, a nuclear-encoded gene coding for a protein imported to an
organelle is seen as a nuclear gene by WWW-Query. The existing organelles are:
CHLOROPLAST Chloroplast genome (EMBL, GenBank, NBRF, Hovergen) MITOCHONDRION Mitochondrial genome (EMBL, GenBank, NBRF, Hovergen) KINETOPLAST Kinetoplast genome (EMBL, GenBank, Hovergen) NUCLEAR Nuclear genome (all)
Molecule
back to criteria
In ACNUC, molecule denotes the chemical nature of the sequenced molecule
(e.g., DNA, mRNA, tRNA). Molecule should not be confused with type which
identifies the encoded molecule (e.g., protein, tRNA, rRNA). Thus the
sequence of a tRNA gene has DNA for molecule because DNA rather than tRNA was
sequenced. The subsequence covering the tRNA region has tRNA for type because
this is the nature of the encoded product. Molecule is defined only for the
nucleotide sequence banks (GenBank, EMBL, Hovergen, NRSub, and CGDB).
Presently the existing molecules are:
DNA Sequenced molecule is DNA (all) RNA Sequenced molecule is RNA (all) MRNA Sequenced molecule is mRNA (GenBank, Hovergen) RRNA Sequenced molecule is rRNA (GenBank, Hovergen) URNA Sequenced molecule is tRNA (GenBank, Hovergen) URNA Sequenced molecule is snRNA (GenBank, Hovergen)
Reference
back to criteria
Enter the reference name. References are specified as follows depending on
the type of document:
Document Format Example Journal article journal_code/volume/1st_page jme/34/17 Book book/year/1st_author book/1980/broker Thesis thesis/year/1st_author thesis/1984/wildgruber Patent patent/patent_coded_number patent/ep0238993 Unpublished, or submitted unpubl/year/1st_author unpubl/1993/cho
Author
back to criteria
Enter an author name, possibly using * to match any string of characters
(slow). Examples: YANOFSKI, YANOF*. Only last names are indexed - initials
are ignored. All authors of journal articles are indexed. Only the first author of
books, theses, patents and other documents are indexed.
Journal
back to criteria
Enter a journal code.
Status
back to criteria
Status denotes the completion level of sequence annotations. This
information exists only with the data banks in EMBL or SWISS-PROT
formats.
The existing status are:
PRELIMINARY Preliminary annotated sequence STANDARD Fully annotated sequence UNANNOTATED Only DE, AC and R[NPXATL] UNREVIEWED Sequence with unreviewed annotation
Logical Operators
back to criteria
Elementary selection criteria (e.g., by species, by keyword) may be
logically combined to create multi-criterion queries using operators.
Only one operator is available for the first selection criteria: NOT.
The default option, DEFAULT, has no effect on the query and is present
only for aesthetic purpose. For the three other criteria, four operators are
available: AND, OR, AND NOT, OR NOT.
Cross-Taxa Search
The left page ("Taxon selection")is used to build a query with Cross-Taxa which allows you to retrieve all gene families that are shared by a given set of taxa (the upper list) and that are not associated with another set of taxa (the lower list).
The right page ("Taxonomy helper") can be used to check the taxonomy of the species of interest.
It allows to retrieve all gene families that are shared,strictly or not,by a first set of taxa defined in the first field and that are not associated with a second set of taxa defined in the second field. Any taxonomic level can be used and mixed to compose the query (e.g.,Homo sapiens ,Primate,Mammalia ). For example it is possible to retrieve the families of bacterial genes specific to a toxic strain of Escherichia coli, or to retrieve the gene families found in mammals but not in birds or as well to retrieve gene families which are found in mammals only.
The first set of taxa can be used for an inclusive or
exclusive selection of families.
It is as well possible to pre-select the families by the number of sequences/species,
as shown on this example.
Warning! Cross-Taxa queries can take a lot of time. For simple queries on families (for example, to retrieve all the families containing a sequence of mammalia), we recomand to use WWW-Query.
Two types of search are available:
Inclusive Search:
Any family containing at least one species from each taxon of the list will be selected.
Usage:
- if you specify Primates in the list1 (with an empty list2) you will get all the families with at least one sequences from Primates.
- if you specify Homo and Mus in the list1 (with an empty list2) you will get all the families with at least one sequence of Homo and one sequence of Mus (for example a 3 sequences-family, with one sequence from Homo, one sequence from Mus and one sequence from Bos.).
- if you specify Mammalia in the list 1 and Primates in the list 2, you will get all the families with at least one sequence of Mammalia but no sequence from Primates (for example a 15 sequences-family, with 5 sequences from Bos, 5 sequences from Mus, 2 sequences from Rattus and 3 sequences from Xenopus).
back to Cross-Taxa
Exclusive Search:
Any family containing only species from all the taxa of the list (i.e. none from other taxa) will be selected.
Usage:
- if you specify Primates in the list1 (with an empty list2) you will get all the families with sequences from Primates only.
- if you specify Homo and Mus in the list1 (with an empty list2) you will get all the families with at least one sequence of Homo and one sequence of Mus and no sequence from any other species (for example a 3 sequences-family, with 2 sequences from Homo and one sequence from Mus).
- if you specify Homo and Primates in the list1 (with an empty list2) you will get all the families with sequences from Primates only and at least one sequence from Homo (for example a 5 sequences-family, with 3 sequences from Homo and two sequences form Pan).
- if you specify Mammalia in the list 1 and Primates in the list 2, you will get all the families with at least one sequence of Mammalia and Mammalia only and no sequence from Primates ( for example a 18 sequences-family, with 3 sequences from Bos, 7 sequences from Mus and 8 sequences from Rattus).
Selection of families by number of sequences or species
You can select families by its number of sequences and/or by its number of species. For example it is useful to avoid families presenting only one sequence or one species.
Nota Bene:
The number of sequences and taxa displayed with the list of families are correct for protein sequences only.
If you are using a nucleic database, the real number of sequences and taxa in the family
(as given on the family associated page) can be different.
Moreover, sligthly differences can appear here and now betwen the number of taxa and sequences given with the list (precalculated) and the real ones (given on the family page) even for protein databases.
Example
An example of use is given here
back to Cross-Taxa
List Name
Under WWW-Query, the result of each query is saved in a file stored locally on our server. By this way, it is not immediatly lost and the user has the possibility to re-use it for building other queries or for performing treatments.
The lists are stored in a sub-directory of /ftp/ftpdir/pub/ADE-User/data/ created via a cookie for the user (Your data are currently stored in the directory /ftp/ftpdir/pub/ADE-User/data/ 1456714067, you can chek your previous operations here ).
It is up to the user to give a name to a list. If no name is given, the system uses by default list. Be aware than some lists are
created automatically by the system. These lists are always called list
and erase the lists previously defined with this name. The sequences list of a family
"FAMILY_NAME" is automatically called "FAMILY_NAME_lst" (or "P_FAMILY_NAME_lst" after a species selection).
Note that files older than 1 week in the directory created by the user are
automatically cleaned.
Frequently Asked Questions
This page is under development, sorry. Last update = January 7, 2004.
- How can I retrieve a protein or a gene?
- I know the name of a sequence, what can I do with it?
- There is a lot of databases available, which one should I use?
- I do not find my sequence in your databases. Why?
- The buttons do not work ...In construction...
- I can not select the database ...In construction...
- How can retrieve sequences associated to a keyword?In construction...
- How can retrieve sequences associated to a taxon?In construction...
- What are families?In construction...
- What is the aim of the family databases?In construction...
- How can I retrieve families associated to a keyword?In construction...
- How can I retrieve families associated to a taxon?In construction...
- Which family database should I use?In construction...
- What is the meaning of the nucletoide and protein buttons?In construction...
- What is the meaning of the sequence and family buttons?In construction...
- How to use WWW-Query?
- How to use Cross-Taxa?
- Where my data are stored ?
- How can I retrieve my data ?
- What is the difference between Cross-Taxa and WWW-Query ?
- How can I retrieve a protein or a gene?
You should go to the WWW-Query page (here). This is an "expert-user" page allowing complex queries.
- For a quick search, click on the button "Quick Search".
This page retrieve all the sequences (or families)
associated to a word, which can indiferently be a name, an accession number, a keyword, a species...
The results are thus more exhaustive than with WWW-Query.
To retrieve a sequence, use the left form of the page. Input a word, select a database then click on "submit". If you check the "exact match" box, only exact matches will be retrieved. Several lists of sequences (or families) are usualy generated. For example, search the word "BTG1" in SWISS-PROT: a list (called "name") of sequences presenting a name matching the word "BTG1" and a list (called "keyword") of sequences presenting a keyword matching the word "BTG1" are generated. Afterwards all theses sequences are regrouped in a global list (called "all") and displayed. If the "exact match" box is checked, only the sequences associated with the keyword "BTG1" are retrieved. - For a simple query, click on the button "Go to Simple Search". This page allows you to retrieve sequences according to simple criteria as the sequence name, the accession number, a keyword.
- First of all you should choose if the database you want to query is a database of protein sequences or nucleotide sequences. By selecting one of the "protein" or "nucletoide" button, you will change the available databases in the database list on the right of the buttons. For example if you select the "protein" button, the SWISS-PROT database will be available, afterwards if you select the "nucleotide" button, the GenBank database will be available.
- Once you selected the database to query,for example SWISS-PROT, choose a criteria in the list, and enter a word in the field. For example if you want to retrieve the sequence BTG1_HUMAN in SWISS-PROT, select the criterium "Sequence name" and type "BTG1_HUMAN" in the field, then click on the SUBMIT button.
- Use of the wild card : the star character "*" allows you to retrieve sequences according to a name or a keyword
matching a given word. For example if you enter "BTG1_*" instead "BTG1_HUMAN" you will retrieve all the sequences
in SWISSPROT with a name begining by BTG1_.
Identicaly, for a search by keyword, you may type RNA POLYMERASE, or *POLYMERASE, or RNA*. - Complex queries are possible on the WWW-Query page (also accessible via the "Go to Expert Search" button).Firstly you should choose if you want to retrieve sequences or families of sequences. Afterwards you can fill the form as for a simple query except that you can combine several critera (this is optional, if you want to use only one criteria, let the 3 other fields empty) and that here is more criteria.
back to FAQ - For a quick search, click on the button "Quick Search".
This page retrieve all the sequences (or families)
associated to a word, which can indiferently be a name, an accession number, a keyword, a species...
The results are thus more exhaustive than with WWW-Query.
- I know the name of a sequence, what can I do with it?
- retrieve this sequence in one of the database to get its annotations, its sequence data, or apply several bioinformatics tools as BLAST,CLUSTALW,secondary structuire prediction, pattern search, and many NPSA tools, etc.
- retrieve the family associated to this sequence, get all the sequences in the family and modify this list of sequences if needed, apply several bionformatics tools to these sequences, display the alignment and the phylogenetic tree, get the partial alignment of sequences associated to peculiar species, etc.
- There is a lot of databases available, which one should I use?
- General databases, as EMBL, GenBank or SWISS-PROT can be queried with the different tools and utilities proposed by the PBIL. These database are regulary updated (daily for GenBank and EMBL, weekly for SWISS-PROT)
- Other specific databases are dedicated to peculiar organisms, molecules, functions and/or phylogenetic analysis.
For example , the Hobacgen database contains families of homologous genes from bacteria and archaea.
These databases are described on the home page of the server.
Database contents are given here . - I do not find my sequence in your databases. Why?
- The buttons do not work ...
- I can not select the database ...
- How can retrieve sequences associated to a keyword?
- How can retrieve sequences associated to a taxon?
- What are families?
- What is the aim of the family databases?
- How can I retrieve families associated to a keyword?
- How can I retrieve families associated to a taxon?
- Which family database should I use?
- What is the meaning of the nucletoide and protein buttons?
- What is the meaning of the sequence and family buttons?
- How to use WWW-Query?
- How to use Cross-Taxa?
- Where my data are stored ?
- How can I retrieve my data ?
- What is the difference between Cross-Taxa and WWW-Query ?
You can
or
back to FAQ
Several database are available on the server:
back to FAQ
First of all, your sequence may be actually not present in the databases you are querying (For example, if you are looking for a protein sequence in EMBL , or for a animal sequence in Hobacprot/Hobacnucl, or for a cds in Hobacprot, etc). See this question for more informations abot different databases.
Maybe there was a confusion between the name and the accession number of the sequence when using WWW-Query.
WWW-Query allows you to search a sequence by its name or its accession number;
for example if an accession number is given instead the name, the sequence will not bet retrieved.
Alternatively Quick Search allows you to retrieve all the sequences associated to a word,
which can indiferently be a name, an accession number, a keyword, a species...
The results are thus more exhaustive than with WWW-Query.
Finally, in several databases, as Hoverprot and Hobacprot, the sequence names can be sligtly different from the SWISS-PROT ones, due to the duplication of the sequences. To avoid this problem, use the accession number instead the sequence name to retrieve you sequence.
back to FAQ
In construction...
back to FAQ
In construction...
back to FAQ
In construction...
back to FAQ
In construction...
back to FAQ
In construction...
back to FAQ
In construction...
back to FAQ
In construction...
back to FAQ
In construction...
back to FAQ
In construction...
back to FAQ
In construction...
back to FAQ
In construction...
back to FAQ
In construction...
back to FAQ
In construction...
back to FAQ
Under WWW-Query and Cross-Taxa, the result of each query is saved in a file stored locally on our server.
By this way, it is not immediatly lost and the user has the possibility to re-use it for building other queries or for performing treatments.
Thanks to the storage zone defined for the user, there is no confusion when many users are genererating lists with the same name at the same moment.
The lists (of sequences or families) are stored in a sub-directory of ftp://pbil.univ-lyon1.fr/pub/ADE-User/data created
via a cookie
for the user (For example your data are currently stored in the directory ftp://pbil.univ-lyon1.fr/pub/ADE-User/data/
1456714067 , and you can chek your previous operations here).
It is up to the user to give a name to the list to be generated.
If no name is given, the system uses by default list.
Be aware than some lists are created automatically by the system.
These lists are always called list and erase the lists previously defined
with this name.
The sequences list of a family named "FAMILY_NAME" is automatically called "FAMILY_NAME_lst"
(or "P_FAMILY_NAME_lst" after a species selection).
Other data such as alignment files or philogenetic tree files are stored in the user directory as well.
Partial alignments are stored in a sub-directory of the user directory called ALN.
Note that files older than 1 week in the directory created by the user are automatically cleaned.
back to FAQ
You can download all your data at URL:ftp://pbil.univ-lyon1.fr/pub/ADE-User/data/
1456714067
It is recommended that you use a dedicated FTP client to retrieve them instead of a Web browser like Netscape or Internet Explorer.
You can as well retrieve data sequences with the Retrieve button.
back to FAQ
WWW-Query allows you to retrieve sequences or families,Cross-Taxa is used to retrieve only families.
WWW-Query retrieves all the sequences wich fullfill several criteria of different sort, then
generates the list of these sequences, or the list of families associated to these sequences.
Cross-Taxa retrieve families on a taxononomic basis, allowing more precise taxononic selection than WWW-Query.
It is possible to combine results from Cross-Taxa and WWW-Query
(for example, to cross a family list generated with Cross-Taxa and a family generated with WWW-Query).
back to FAQ