One code to find them all
Version 1.0
CORRECTION: The way fasta files are handled has been corrected -- if you had issues with scaffold names they should now disappear (Thanks to H. Lopez for helping in finding the bug).
CORRECTION: One Code now returns correct reverse complement sequences -- there was a bug with reverse complement sequences composed of joined subparts (Thanks to M. Seidl for finding out the bug).
NEW: A small script to sum all *copynumber* outputs is available here (Thanks to P. Koch for the suggestion!).
One code to find them all is a set of perl scripts to extract useful information from RepeatMasker about transposable elements, retrieve their sequences and get some quantitative information.
- Assemble RepeatMasker hits into complete TE copies, including LTR-retrotransposon
- Retrieve corresponding TE sequences, and flanking sequences, from the local fasta files
- Compute summary statistics for each TE family (number of TE copies, genome coverage...)
- Ambiguous cases such as nested TE can be assembled into copies automatically or manually
- Allow for working with a TE user-defined library
- Allow for working with only a user-chosen set of TE families
Download the code
Perl scripts |
Tutorial |
- The downloaded source archive contains the 2 perl scripts, a hopefully informative README and the license file.
- The scripts run on any OS with Perl installed; see the README file for more information.
- The tutorial archive contains a set of example input and output files with different options.
Dictionaries for LTR-retrotransposons
The script build_dictionary.pl matches internal and LTR subparts of LTR retrotransposons, but the outptut files need to be manually checked and completed. Here are some manually curated dictionaries, graciously proposed by those who made them; if you have made one and want to make it available to the community, please contact us and we will add it to the list.
Organism | Repeat library version | Contributor(s) |
Drosophila melanogaster | (RM library release 20061006) | E. Lerat, LBBE, Univ. Lyon 1. |
Homo sapiens | (RM library release 20120124) | E. Lerat, LBBE, Univ. Lyon 1. |
Arabidopsis thaliana | (RM library release 20071204) | A. Haudry & E. Lerat, LBBE, Univ. Lyon 1. |
Reference
If you use One code to find them all in a published work, please cite the following reference:
- Bailly-Bechet M., Haudry A. & Lerat E. (2014) “One code to find them all”: a Perl tool to conveniently parse RepeatMasker output files . Mobile DNA 5:13.