| 301 | === 6/24/2010 ==== |
| 302 | I forgot to log a bunch of changes: |
| 303 | * found a bug because I reduced the wait time for the experiment to complete. Since connect retry was rand(60)+(60) my log copies would copy over |
| 304 | incomplete files, since it waited only 30 seconds to copy. I've redone the numbers, I wait only rand(20) to attempt to reconnect but now try 3 times. |
| 305 | I wait 90 seconds before trying to copy the log file, this should give me enough time to capture the retries. |
| 306 | * There was a cascade of query failures due to the fact that I was searching for the testbed id with the short domain name, instead of the FQDN in the |
| 307 | test bed table. This value was given to me by the gatherer, I've since modified the gatherer to use the FQDN. All the other queries depended on this |
| 308 | number so the error propagated down. |
| 309 | * This error however demonstrated a specific flaw in how I handled empty query results. I indicate Failed queries by returning nil, instead of an array. |
| 310 | The only reason I caught the error was because I tried to flatten the nil return. I've updated this to raise an exception if a nil query occurs for |
| 311 | any of the members of the Identify class. Not being able to identify the node should be a fatal error. This exception is unhandled so it will |
| 312 | propagate up to main block where it gets logged and terminates the script. |
| 313 | * Also added a little bit of logging to gatherer but not much. I should really fix it's error handeling |
| 314 | * Noticed that sometimes I get a xml create object error. I'll have to figure out why that's happening. It's probably due to gatherer not completing |
| 315 | properly. But Now I should be able to find the nodes where it happens. |
| 316 | * Trying to stick to the logging/exception rasing convention of setting the error text to: "class.method - error" |