PRABI-Doua

Pôle Rhône-Alpes de Bioinformatique Site Doua

Barre

ACNUC FORTRAN Application Programming Interface

Contents :

      ACCESSING SEQUENCES OF AN ACNUC DATABASE FROM USER FORTRAN PROGRAMS

   Sequences or subsequences (e.g. protein, tRNA or rRNA
genes) can be read in the acnuc database by your own FORTRAN
programs using the following API. 
The same interface works with all acnuc databases and structures
( GenBank, EMBL, SwissProt or NBRF/PIR).

Seven subroutines/functions (GSNUML, GSNUMLPHA, GFRAG, LIBSUB, GOPEN, CODAA, CLOSEACNUC) 
are provided for your programs to use.

Basically, starting with the sequence name, use subroutine
GSNUML to obtain the sequence length and number in the database and
subroutine GFRAG to read its bases or amino acids, or a fragment of the
sequence. You can also use routine LIBSUB to obtain a short textual
description of the sequence. Protein translation using the adequate 
reading frame and genetic code is also possible (see example 3, below).
Also, call GOPEN once at the beginning of your program to gain access to acnuc.
And CLOSEACNUC may be used to close the acnuc database when needed.

Subroutine GSNUML: to get sequence or sub-sequence number and length from its name
  CHARACTER NAME*16
  CALL GSNUML(NAME,NUM,LENGTH)
      NAME:  character string *16 containing the sequence 
             name.
      NUM: upon return, the sequence number in the database,
           or 0 if NAME is not an existing sequence name.
      LENGTH: upon return, the sequence length in nucleotides.

Subroutine GSNUMLPHA: to get sequence or sub-sequence number, length, 
reading frame and genetic code from its name
  CHARACTER NAME*16
  CALL GSNUMLPHA(NAME,NUM,LENGTH,FRAME,CODE)
      NAME:  character string *16 containing the sequence 
             name.
      NUM: upon return, the sequence number in the database,
           or 0 if NAME is not an existing sequence name.
      LENGTH: upon return, the sequence length in nucleotides.
      FRAME: upon return, the reading frame (0,1,2) of the coding sequence
      CODE: upon return, the genetic code id (0 for standard code)

Subroutine GFRAG: to read all or part of a sequence or a sub-sequence
  CHARACTER SEQ*`some_adequate_length'
  CALL GFRAG(NUM,IFIRST,LFRAG,SEQ)
      NUM:  the sequence number (returned by GSNUML).
      IFIRST:  the position in sequence of the 1st base to
               be read.
      LFRAG:  the number of bases to be read, starting at
              position IFIRST. Upon return, LFRAG contains
              the number of bases actually read. It can be
              smaller than the input LFRAG value if
              FIRST+LFRAG-1 is larger than the sequence length.
              LFRAG is returned null in case of error (illegal
              sequence number, length of SEQ too short,
              illegal IFIRST value).
      SEQ:  a character string of length greater than LFRAG
            that will contain upon return the bases read.

Subroutine LIBSUB: to get a short description of a sequence or sub-sequence
  CHARACTER LIBEL*80
  CALL LIBSUB(NUM,LIBEL)
	NUM: the sequence number (returned by GSNUML).
	LIBEL: character*80 string returned with the sequence or
               sub-sequence name and a short description of it.

Subroutine GOPEN: To gain access to ACNUC files.
  CALL GOPEN
	Place that at the beginning of the program.

Subroutine CLOSEACNUC: To close access to ACNUC files.
  CALL CLOSEACNUC

Function CODAA: translates 3 bases into an amino-acid using a given genetic code
  CHARACTER CODAA*1,RESIDUE*1,CODON*3
  INTEGER GEN_CODE
  RESIDUE=CODAA(CODON,GEN_CODE)  
	CODON: a 3-base codon
        GEN_CODE: an integer specifying the genetic code in use (see example 3
                  below for detailed description of its usage)
                  0 denotes the `standard' genetic code
        RESIDUE: a one-character amino acid (* is returned for a stop codon)


                         EXAMPLES


c declarations: sequence names MUST BE on 16 characters
      character name*16,seq*5000,libel*80
c open the necessary files
      call gopen
c process for example ECOTGP.TRPA subsequence
      name='ecotgp.trpa'  !can use upper or lowercase indifferently
      call gsnuml(name,num,length)
      if(num.eq.0)stop'invalid sequence name'
c get and print a short textual description of it
      call libsub(num,libel)
      print*,libel
c example 1: read the complete sequence in memory
      call gfrag(num,1,length,seq)
      if(length.eq.0)stop'sequence is too long for string seq'
c example 2: read it successively by pieces of k bases
      do 10 i=1,length,k
      l=k
      call gfrag(num,i,l,seq)
       .
       . process the l bases read in seq(1:l)
       . generally l=k, except may be for the last piece
       .
10    continue


c Example 3 translate a protein coding region using the
c appropriate genetic code and reading frame:
c the protein sequence will be in string PROT using the 1-letter code
	IMPLICIT INTEGER(A-Z) !everything not character string is integer
	CHARACTER NAME*16,SEQ*3000,PROT*1000
	CHARACTER*1 CODAA    !prepare for using function CODAA
	CALL GOPEN           !open access to the database
	NAME='HUMMTCG.PE1'   !example: a human mt protein gene
c obtain the reading frame (0,1, or 2)
c obtain the genetic code as known by ACNUC
	CALL GSNUMLPHA(NAME,NUM,LENGTH,FRAME,CODE)
	CALL GFRAG(NUM,FRAME+1,LENGTH,SEQ)	!read the complete sequence
                                            !note the use of var FRAME
	J=0
	DO 1 I=1,LENGTH-2,3
	J=J+1
1	PROT(J:J)=CODAA(SEQ(I:I+2),CODE)   !function codaa translates a codon
                                           !using the code specified by CODE
	LPROT=J             !lprot=length of protein sequence in string PROT




Notes: (1) GFRAG returns lowercase nucleotides for GenBank and EMBL,
and uppercase for NBRF.
       (2) GFRAG subroutine contains a large internal buffer, 
so that there is no inconvenience in reading sequences by 
small pieces if needed.


                  USING THE FORTRAN ACNUC INTERFACE UNDER UNIX

User-written FORTRAN programs that use the above-defined ACNUC API must
be linked to the ACNUC C library, libcacnuc.a. The link is done as in :
f77 -o myprog myprog.f -L. -lcacnuc

The C library is prepared by downloading the C source code
and then doing :
tar xf acnucsoft.tar
make libcacnuc.a


The environment variables acnuc and gcgacnuc are used by all ACNUC programs.