Frequently Asked Questions

check here before you email me please!

Necessities

Who should I cite?

This paper please! Also, see the extra programs below!

Where can I find out more details of how this program runs?

Have a read of the paper!

It randomly crashes on Windows and I don't know what's going on!

Please see the note below about BEAST. Consider opening 'command prompt', and running the program through that (you can drag-drop 'phyloGenerator.exe' into command prompt and press enter). That way, if the program suddenly exits, you'll be able to see the error messages it generates.

Also, phyloGenerator needs to write out data (like your alignments), and if it can't do that it will crash. You need to have permission to write information wherever you run the program. If you don't have permission, the program will crash with some error like "IOError: [Errno 13] Permission denied: 'temp.fasta'" or something else that looks like it's to do with writing a file out. Your desktop should always be a safe place to put pG if in doubt!

What programs does this use? And how do I cite them?

Only cite the programs you use from the following list:

But you must cite BioPython and its Bio.Phylo component. Sorry not to be more help than this, but it's best the authors tell you what to cite so I don't get in trouble!

Why is my directory not writable/why can't I write files to my working directory or to where phyloGenerator is installed?

For reasons that are unclear to me, Windows will let you install phyloGenerator to a place where it can't write out files (I say unclear to me, because you had to write files to copy phyloGenerator in the first place, right?...) This means pG can't pass information to other programs, because it can't write out the sequences it's downloaded for them to use. Picking an output working directory that you don't have permission to save files to will also cause problems! You should always be able to write files to your desktop, so if in doubt put phyloGenerator on your desktop and run it from there.That should fix this problem; I'm afraid I can't offer any advice about what parts of your comupter you have permission to write out files to because I'm not sat at your computer! Sorry!

What is the referenceDownload method? How can it help me make very large phylogenies?

It uses examples of what 'good' sequences of the genes you're using look like to guide its search. It sorts candidate sequences according to length, attempts to find genes inside those sequences where necessary, and tries to align candidate sequences to your example sequences. If that alignment isn't too long (that's what the 'tolerance' you set does) then that sequence is accepted, and pG moves on to the next sequences.

I'm actively playing around with referenceDownload quite a lot (there is a big trade-off between checking sequences thoroughly and speed), but it has allowed me to make pretty large phylogenies (>1000 spp.) rapidly, and with very little input on my part. I'd strongly advise using this method for large phylogenies, and I'd recommend you just cite the pG paper and mention something along the lines of 'using the VERSION referenceDownload method'. Please let me know what you think!

I'm worried about taxonomy. How does pG handle taxonomy?

By default, quite poorly - it's just searching for species names! If that makes you shudder, use the '-taxonIDs' argument to specify particular taxon IDs from GenBank. If you have a way of getting more precise about taxonomy than that, please get in touch with me about it!

While you're here, let me say a few words about taxonomy and GenBank, because a lot of people bring this up. I actually happen to think the GenBank TaxonID system is very good, and does a pretty good job of keeping everything up-to-date and taxonomically reasonable, particularly given it attempts to do so for everything in GenBank! However, I completely agree that taxonomy and name resolution is hard, and I think being careful about these things is important. The advantage pG does have going for it is it's a reproducible method; your mistakes/successes can be described and repeated by stating what you made pG do, and I think that's a useful first step in getting things right.

I get errors when I download sequences - something about DTD files!

This isn't a bug, it's a 'feature'. In other words, this doesn't affect your download. Just leave the program running; your sequences are being downloaded and interpreted correctly.

My phylogeny doesn't look anything like it should do. What's gone wrong?

In my experience, it's rare for phyloGenerator not to give you a good result, but bear in mind that no one can write an automated program that works every time with something like this - otherwise there wouldn't be phylogeneticists in the first place! For difficult projects, there's not likely to be a quick solution. Sorry. Please do contact me because I like to know when runs haven't worked. However, there are two major sources of error that phyloGenerator can help with: absence of a constraint tree, or a bad alignment.

Use Phylomatic to make a constraint, take one from a paper - whatever - that way you can't get something you wouldn't expect. If you're using a dated phylogeny, make sure you name the dated clades in your phylogeny - something like '((A:2,B:2), C:3)Named.clade:5' as oppose to '((A:2, B:2), C:3):5'.

Check your alignment. phyloGenerator will warn you (the column marked 'warn') if your alignments seem too long. Output them, open them in a program like Clustal-X2, and see if there are long stretches of gaps, or an alignment that doesn't look like a set of neatly-lined-up sequences. If you find that, go back to the sequence download stage, trim your sequences, check the lengths, and maybe think about using TrimAl in the alignment stage.

Finally, you may find some species are on incredibly long branches, or that your smoothed phylogeny with PATHd8 have lots of polytomies (multiple branches coming from the same node). The later is caused by the former, because the long branches make the small branches look so small they get averaged down to zero length by PATHd8. The solution is to find better sequences for those long-branch species (see the section on DNA download in the guide), or, if that doesn't work, to try adding in a few more species that are relatives of that long branch group. If you can't fix this problem using these methods, do drop me an email, but you may have hit a problem that requires the expertise of a phylogeneticist. Sorry!

I can't see the files pG is writing out / it crashes when I'm trying to write out files ('OSErrors')

Whenever you need to input a file or directory, you'll need to give phyloGenerator the 'absolute path' to the file. Something like '/Users/will/Documents/dna.fasta', or 'C:\Documents and Settings\dna.fasta'. If you're on a Mac or Linux computer, do not use something like '~/Documents/dna.fasta' - only '/Users/will/Documents/dna.fasta' will work. Doing anything else will produce errors during the program, and because pG uses so many other programs it's hard for me to predict exactly when that might happen. Sorry!

I can't get PATHd8 to work on a PC!

First, try installing Cygwin (type 'cygwin' when prompted to go to their website). If that doesn't work, you can try copying a DLL into the 'requires' folder inside the 'phyloGenerator' folder on your computer - again, you can download this when prompted too. If that doesn't work, please contact me. Those two steps have worked for everyone so far, but if they don't for you I want to know!

Whenever phyloGenerator gets to the BEAST stage it crashes or suddenly exits.

BEAST uses something called Java. If you're on a Mac, please install the latest version of Java as many users (including me!) cannot run pG with the limited version that ships with MacOS. It seems that some Windows PCs have been configured to have multiple copies of Java on their computer, but by default the oldest version of Java is used. This causes problems for BEAST, and so it crashes. You need to modify your 'PATH' variable so that its first entry is the directory where your latest Java installation is (here is a walkthrough a friend found helpful). Note that you'll probably want to add something like 'C:\Program Files\Java\java16\bin'. A good way of making sure it's this problem is to go into the 'requires' folder inside phyloGenerator, and double-click on 'BEAST v1.7.1.exe'. If that won't run, you need to update your version of Java (try the 'Java' section of Control Panel), and if it does then you need to follow the instructions above. I'm sorry that I can't automate this for you, but if you run into problems do drop me an email. Only one person has had this problem (to my knowledge!) if that's of any consolation!

How can I exit phyloGenerator in the middle of a run?

Press 'control-c' a few times. On a PC, you'll get a dialogue box warning of an error - just hit enter and ignore it. Doing so during a RAxML run will cause problems - see below

I exited phyloGenerator in the middle of a RAxML run, and now it won't work.

You've got a temporary file on your computer that's stopping RAxML running. Search for all files containing 'RAxML_' (note the capitalisation), and remove them if they're anywhere near phyloGenerator or have the word 'temp' in them. Otherwise, just install a fresh copy of phyloGenerator.

I can't load any files, or even start phyloGenerator, despite using the correct path

Make sure you are using the correct, full, absolute path. For example, 'Demos/Silwood_Plants/sequences.fasta' is an incomplete absolute path, but '/Users/will/Documents/phyloGenerator/Demo/Silwood_Plants/sequences.fasta' is a complete, full path.

Also, if you're dragging and dropping files into phyloGenerator when it asks for them, make sure there are no trailing spaces at the end of your filename. If in doubt, press delete - if the file path doesn't appear to change, you had some trailing spaces.

If your folders or files have spaces in them, most computer programs (including phyloGenerator) run into problems. On a Windows computer, putting the file and its path in double quotes (e.g., "C:\My folder\My file.txt"), and on a Mac/UNIX 'escaping' the space with a backslash (e.g., "/My\ Path/My\ file.txt") will help. If in doubt, drag-and-drop the file into your Terminal/Command Prompt to see how your computer wants the file referenced.

It warns me that a blank input wasn't recognised, or an input I typed in doesn't work.

You might have hit a button while the program was running earlier and it remembered that. Try entering the command again, but if you still get problems contact me.

phyloGenerator keeps hanging and not doing anything!

I've set phyloGenerator to pause every ten times it downloads something for five seconds, so as not to overload the NCBI database. However, I'm being massively conservative, and if you want to alter that (at your own risk!) then play around with the 'delay' argument when running phyloGenerator. Many of the programs phyloGenerator calls take quite a while too - if you can hear your computer whirring, it's probably just building your phylogeny.

If you're aligning sequences and it seems to be taking forever, make sure there are no abnormally long sequences in your dataset. If most of your sequences are 1000bp long, and one or two are >10000bp long, you'll need to trim the sequences or you'll crash most alignment programs.

I keep getting funny characters in my species names (backslashes, accented characters, etc.)

Make sure you have no formatting whatsoever in your input files. You may non-breaking spaces or meta-characters inserted by Microsoft Word. If in doubt, copy and paste your input file into a really simple text editor (Notepad, TextEdit, something like that) and manually type in the species that are giving you problems.

I keep getting warned about server errors

This is nothing to worry about, unless it crashes phyloGenerator or causes it to exit. If either of those things happen, consider not downloading as many species (more than six hundred seems to cause it problems, and I've not designed this program to handle that many), but please do send me an email about it.

Can I run two instances of phyloGenerator at a time!

Sorry, you can't. Only one run at a time, as the temporary files phyloGenerator uses can get confused and you can get strange results.

How do I make the command prompt larger (Windows only)

Click on command prompt's icon (top-left of the screen), click properties, then got to layout and change the screen width.

I don't understand what this program does! Help!

Have a read through the walkthrough. If that doesn't help, drop me an email. I'm afraid I can't explain the whole of phylogenetics to you (sorry!) but I'm quite likely to help you if you send me a polite email. Even more so if you promise me a beer!

I don't think this program is a good idea. It doesn't include method X / philsophically it's a bad idea / etc.

I probably agree with you. Please, send me an email and let's talk about it. Maybe we can improve the program, but at the very least I'd appreciate your feedback. For the record, I certainly don't think this program is a replacement for phylogeneticists.

It doesn't work! Help!

Oh no! Send me a copy-pasted version of everything you did and got back from phyloGenerator, any files you gave to phyloGenerator, and details about your computer (Mac? Windows? What version?). I'm always grateful for feedback (even errors!), and I'll try and help you as soon as I can (I normally reply to emails on the same day they're sent, but remember I'm in the US so time zones may be a factor). If you're on a Windows computer, please try running the program from 'command prompt' first (see above, third FAQ entry) as copy-pasting the error message will really help me help you. Make sure you read all the FAQs above first, though!