![]() |
About | ![]() |
![]() |
Projects | ![]() |
![]() |
Publications | ![]() |
![]() |
Software | ![]() |

Data Mining
Community structure graph derived from observations of seabirds and whales in the north Atlantic. Read more …
What is data mining?
Data mining:
- is the process of finding new information from existing collections of data
- draws on a variety of tools and techniques
- uses data from many different sources in order to exploit relationships between those data.
Why are we interested in data mining?
Collecting data from polar regions is difficult and expensive. Through data mining, the Data Centre seeks to make the data that we hold more useful to the Antarctic community.
Data mining is used to enhance the value of the data held by the Data Centre in several ways:
- analyses of multiple data sets to uncover information which would not be obvious from single-data set analyses
- using exploratory techniques to uncover new information from the data
- producing actionable information and "end product" data from low-level scientific data
- to generate a better understanding of the holdings of the Data Centre. This includes data management information, such as:
- data errors, and duplicated or missing information
- linkages between databases
- insights into ways of improving our data acquisition procedures.
Data Mining Projects
Current data mining projects
- Visualising field trips and track usage on Macquarie Island
- Using the Wii Remote for data exploration with Matlab
- A large-scale model of cumulative light exposure of sea ice in East Antarctica
- Investigation of interannual variations in ice-edge chlorophyll blooms in East Antarctica
- Network-based visualisation and exploration of scientific data
- Network-based methods for community ecology
- Web-based animations of ARGOS satellite tracks (animal movements) with physical environmental data such as sea ice
- Involvement in marine bioregionalisation exercises, in the Southern and global oceans
Past data mining projects
Publications
Request a copy of any of these publications.
Journal and conference publications
| 1. | Raymond B., Hosie G. (2009) Network-based exploration and visualisation of ecological data Ecological Modelling | Details | Preprint |
| 2. | Raymond B., Meiners K., Fowler C.W., Pasquer B., Williams G.D., Nicol S. (2009) Cumulative solar irradiance and potential large-scale sea ice algae distribution off East Antarctica (30°E-150°E) Polar Biology | Details | Metadata & data link |
| 3. | Williams G.D., Nicol S., Raymond B., Meiners K. (2008) Summertime mixed layer development in the marginal sea ice zone off the Mawson coast, East Antarctica Deep-Sea Research Part II | Details | |
| 4. | Lawton K., Kirkwood R., Robertson G., Raymond B. (2008) Preferred foraging areas of Heard Island albatrosses during chick raising and implications for the management of incidental mortality in fisheries Aquatic Conservation: Marine and Freshwater Ecosystems | Details | |
| 5. | Pinkerton M.H., Smith A.N.H., Raymond B., Hosie G.W., Sharp B. (2008) Extrapolating Continuous Plankton Recorder Data through the Southern Ocean using Boosted Regression Trees CCAMLR-WG-SAM 2008 | Details | |
| 6. | Martin-Smith K., O'Brien P., Raymond B., Constable A. (2007) Summary factsheets for bioregionalisation of the Southern Ocean - examples from the Indian Ocean Sector (Area 58) CCAMLR Workshop on Bioregionalisation. Brussels, Belgium, August 2007 | Details | |
| 7. | van den Hoff J., Burton H., Raymond B. (2007) The population trend of southern elephant seals (Mirounga leonina) at Macquarie Island (1952 - 2004) Polar Biology | Details | |
| 8. | Grant S., Raymond B. (2007) Data for bioregionalisation of the Southern Ocean CCAMLR Workshop on Bioregionalisation. Brussels, Belgium, August 2007 | Details | |
| 9. | Raymond B., Belbin L. (2006) Visualisation and exploration of scientific data using graphs. Data Mining, Lecture Notes in Computer Science | Details | PDF (requires subscription) |
| 10. | Rhodes M., Wardell-Johnson G.W., Rhodes M.P., Raymond B. (2006) Applying network analysis to the conservation of habitat trees in urban environments: a case study from Brisbane, Australia. Conservation Biology | Details | PDF (requires subscription) |
| 11. | Meiners K., Pasquer B., Raymond B. (2006) On the large-scale distribution of sea-ice algae off East Antarctica and the importance of sea-ice thickness and snow cover. International Workshop on Antarctic Sea Ice Thickness, Hobart, 5-7 July 2006. | Details | |
| 12. | Raymond B., Constable A., Sokolov S. (2006) Network approaches to marine regionalisation. Network Theory Working Group Meeting 3, CSIRO Sustainable Ecosystems, Canberra, June 2006. | Details | |
| 13. | Burton H.R., Venegas S., Van den Hoff J., Raymond B., M Curran M. (2006) The annual numbers of Leopard Seals (Hydrurga leptonyx) sighted at Macquarie Island (over 56 years) are correlated significantly with periodic flux of sea-ice concentration south-east of the island. Abstract of the SCAR Open Science Conference, Hobart, 12-14 July 2006 | Details | |
| 14. | Van den Hoff J., Burton H., Raymond B., Bester M. (2006) Long-term changes in the population status of Southern Elephant Seals (Mirounga leonina) at Macquarie Island, 1952-2005. Abstracts of the SCAR Open Science Conference, Hobart, 12-14 July 2006 | Details | |
| 15. | Raymond B., Meiners K., Curran M., van Ommen T. (2006) A conceptual model of the large-scale distribution of sea ice algae off East Antarctica. Abstracts of the SCAR Open Science Conference, Hobart, 12-14 July 2006. | Details | |
| 16. | Raymond B., Constable A. (2006) Regionalisation of the Southern Ocean: A statistical framework. CCAMLR WG-EMM-06/37 Agenda Item 5 & 6. | Details | |
| 17. | Woehler E.J., Raymond B., Watts D.J. (2006) Convergence or divergence: where do short-tailed shearwaters forage in the Southern Ocean? Marine Ecology Progress Series | Details | |
| 18. | Grant S., Constable A., Raymond B., Doust S. (2006) Bioregionalisation of the Southern Ocean: Report of Experts Workshop (Hobart, September 2006) | Details | PDF and supplementary material |
| 19. | Raymond B., Hosie G., Woehler E. (2006) Structured graphs for visualisation and exploration of biodiversity data. 5th International Conference on Ecological Informatics, Santa Barbara, California, December 2006. | Details | |
| 20. | Raymond B., Rhodes M., Wardell-Johnson G., Stark J. (2005) Network-based visualisation: exploring case studies of bat roost networks and benthic assemblages. Network Theory Working Group Workshop II, Canberra, March 8-9. | Details | |
| 21. | Constable A.J., Candy S.J., Raymond B. (2005) Examination of the characteristics of the fishery for Dissostichus eleginoides in the CCAMLR statistical subarea 48.3 and its implications on estimating trends in catch per unit effort. CCAMLR WG-FSA-SAM-05/17 | Details | |
| 22. | Constable A.J., Ball I., Raymond B., Candy S., Williams R., Dunn A. (2005) Evaluating methods to assess yield of patagonian toothfish (Dissostichus eleginoides) in CCAMLR Division 58.5.2. | Details | |
| 23. | Meiners K., Raymond B., Williams G., Massom R., Nicol S. (2005) A conceptual model of the large-scale distribution of sea ice algae off East Antarctica during the autumn-winter transition. Conference: Dynamic Planet, Cairns, August 22-26 | Details | |
| 24. | Raymond B., Watts D.J., Burton H., Bonnice J. (2005) Data Mining and Scientific Data. Arctic, Antarctic, and Alpine Research | Details | Abstract |
| 25. | Cunningham L., Raymond B., Snape I., Riddle M.J. (2005) Benthic diatom communities as indicators of anthropogenic metal contamination at Casey Station, Antarctica Journal of Paleolimnology | Details | PDF (requires subscription) |
| 26. | Raymond, B., Hindell M., Worby T., Williams G., Meiners K., Hosie G., Adams N., Woehler E. (2005) Ecological Change in East Antarctica. Poster presented at International Workshop - Ecological change in East Antarctica - Stochastic Variability, Cycles or Regime Shifts? Hobart, 5-7 September 2005 | Details | |
| 27. | Raymond B., Belbin L. (2004) Visualisation and exploration of scientific data using graphs. Proceedings of the Third Australasian Data Mining Conference, December 2004, Canberra, Australia | Details | |
| 28. | Emmerson L., Raymond B., Southwell C. (2004) Modelling availability bias using existing time series count data: Adélie penguins as a case study. CCAMLR WG-EMM-04/54. Agenda Item No 6.1 | Details | Abstract PDF |
| 29. | Raymond B., Belbin L., Stark J. (2004) Graphical methods for the exploration of ecological databases. 2004 Meeting of the Ecological Society of Australia, Adelaide, December 2004 | Details | Abstract |
| 30. | Raymond B. (2004) Data mining: making the most of polar and oceanographic information in the 21st century. Proceedings of the 30th Annual Conference of the International Association of Aquatic and Marine Science Libraries and Information Centers, Hobart, September 2004 | Details | Abstract |
| 31. | Woehler E.J., Raymond B., Watts D.J. (2003) Decadal-scale seabird assemblages in Prydz Bay, East Antarctica. Marine Ecology Progress Series. | Details | |
| 32. | Raymond B., Woehler E.J. (2003) Predicting seabirds at sea in the Southern Indian Ocean. Marine Ecology Progress Series | Details |
PDF Supplementary material |
| 33. | Woehler E., Raymond B., Watts D. (2002) Long-term study analyses seabird communities. Australian Antarctic Magazine 4, Spring 2002 | Details | |
| 34. | Raymond B., Woehler E.J. (2002) Mining Antarctic scientific data: a case study Proceedings Australasian Data Mining Workshop 3rd December 2002, Canberra, Australia | Details |
Draft or in-press publications
- Schwarz, J.N., Raymond, B., Williams, G., Marsland, S., Pasquer, B., Mongin, M., and Wright, S. (2009) Climatological anomalies in wind, sea surface temperature, sea-ice and chlorophyll concentrations during the BROKE-West survey. Deep-Sea Research II, in press.
- Woehler, E.J., Raymond, B., Boyle, A., and Stafford, A. (2009) Seabird assemblages observed during the BROKE-West survey of the Antarctic coastline (30°E—80°E), January – March 2006. Deep-Sea Research II, in press.
Software
The Polar Toolbox is a collection of Matlab functions for various analyses of Antarctic data, but which might be useful for analyses of other data. Most of this code has been tested under Matlab 2008a. It may also be compatible with other applications, such as Octave, which is a freely-available, mostly-Matlab-compatible package.
All code here is experimental and made freely available. Before downloading you will need to register (free!) with the Australian Antarctic Data Centre. This allows us to direct our development attention to the most active files.
Toolbox contents
% Polar Toolbox: A Matlab toolbox for various Antarctic analytical tasks % Ben Raymond ben dot raymond at aad dot gov dot au % September 2009 % % aloc - Belbin's ALOC non-hierarchical clustering algorithm % angle_normalise - normalises angles to the range -pi to pi % bezierfit - cubic bezier curve fitting, given four control points % cellmax - the maxima of a cell vector of identically-sized matrices % cellmean - the mean of a cell vector of identically-sized matrices % cellstd - the SD of a cell vector of identically-sized matrices % cluster_clara - clustering using R's CLARA function. Requires R and the matlabRlink toolbox to be installed % cutree - cuts a dendrogram and return the group membership vector % dump_csv - write numeric matrix to comma separated text file % gebco_colormap - colormap similar to that used on GEBCO bathymetric charts % gb - builds network structure from data matrix, following Raymond & Hosie 2009 % get_ice - retrieves ice data from binary files as obtained from NSIDC/seaice.de/etc % get_icemotion - retrieves ice motion data from binary files as obtained from NSIDC % get_sst - retrieves sea surface temperature data from binary files as obtained from NOAA etc % icemotion_easegrid_ll2rc - converts longitude/latitude coordinates to row/column coordinates in the ice motion EASE grid % indicator_spp - calculates indicator species from clusters, following Dufrene & Legendre (1997) % join - join a cell array of strings or numeric vector into single string % mean_nonan - calculates means, ignoring NaN elements % ndvi_colormap - colormap similar to that used to show normalised vegatation index images % proplim - confidence interval of a proportion m/N % ranks - ranks of x adjusted for ties % slidingmean - applies a sliding mean to a vector % slidingmedian - applies a sliding median to a vector % spearman - computes Spearman's rank-order correlation coefficient for two vectors x and y % split - split a string into substrings % std_nonan - calculates standard deviations, ignoring NaN elements % streampatch - render 2-d streamlines using patch arrow objects % sunriseset - calculates sunrise and sunset times % wiifigure - allows a Matlab figure to be panned, zoomed, and rotated by a Wii remote device




