Australian Company Announcements / Searching

The enhanced searching features advised in an earlier announcement are now operational for SIRCA’s Australian Company Announcements (ACA) data set. ACA’s search syntax now lets you construct complex queries. The examples below show some of the things you can do.

Entering one or more words in the search field makes ACA search for documents that contain any of the words. For example, searching for

earnings quarterly

will return documents that contain either of both of the words “quarterly” and “earnings”.

You can specify that a particular word must or must not appear in search results by prefixing it with a “+” or “-” respectively. For example, searching for

+mining exploration -gold

will only return documents that contain the word “mining” and do not contain the word “gold”. Results may or may not contain the word “exploration”, but those that do will be considered more relevant and will appear first in the results list.

Searches can also combine words using the AND and OR keywords (NB: These must be entered in caps). In addition, parentheses can be used to group search terms. For example, searching for

kalgoorlie AND (gold OR silver)

will return documents that contain the word “kalgoorlie” and either or both of “gold” or “silver”.

It is possible to search for a phrase by enclosing it in quotes. For example, searching for

"chinese government"

will only return documents that contain that exact phrase.

You can use the wildcard characters “*” (to match any number of letters) or “?” (to match a single letter). For example, searching for

chin*

will match “china” and “chinese”, as well as any other word starting with “chin”. Wildcard characters can be used at the end of a search term or in the middle, but not at the start.

You can even do fuzzy and proximity searches.

A fuzzy search finds matches like the string you define rather than exactly that string. For instance, a fuzzy search for “gold” might find “sold”, “bold”, “told”, “sole” depending upon the degree of fuzziness implemented. The degree of fuzziness is a parameter you can specify.

gold~

is how you request a fuzzy search for “gold” and

gold~0.8

defines a degree of fuzziness. Values closer to one are more precise and those nearer to zero are less precise. The default value, when no degree is specified, is 0.5.

A proximity search finds matches where words are within a specific number of words from each other. For example you may wish to find announcements with “cash” and “share” within six words of each other. Then you would specify

cash share”~6

ACA’s search functions are now implemented using Apache Lucene. See the Lucene Documentation for a complete description of the search syntax.

In addition to the new search capabilities, ACA now delivers all available text conversions as part of the Download Results link. Previously that link only delivered PDF files when these were available. Now, the text conversions of those files are also provided.

Researchers should note these conversions are not always accurate. Text conversions result from a process that does its best to find usable text in PDF documents. The worst conversions are not used but some degree of errors must be accepted in order to make text searching possible. So do not expect these text conversions to be error free. The original PDFs are provided so you can correct errors in announcement conversions that are particularly important to you.

Despite these realities, we are sure access to our text conversions means will you now be better able to routinely process Australian company announcements for statements that are most relevant to your research. The new targeting tools should help you find those announcements more quickly.

Enhancements to Australian Company Announcements database

Sirca is about to launch a new extension of their Australian Company Announcements product on July the 4th. The product currently allows searching of all Australian Company Announcements through a simple search interface. 

In the coming release, users will be able to perform rich searches using the Lucene Search Syntax. Users will also be able to retrieve the ‘OCR-ed’ version of the PDF in plain text.

This is the first in a series of imminent releases of new and innovative extensions to the Sirca product suite.

Australian Company Announcements / IPOs

The Australian Securities Exchange has supported Sirca since its inception, and has kindly made a broad array of its datasets available to us in order to help our mission to support academic research into the financial markets. These datasets include the feed of company announcements resulting from the ASX’s listing rules in the context of the continuous disclosure regime. As a result of this Sirca has developed an online database of ASX listed company announcements with an archive going back to 1992. This is available as an additional on-line resource called “Australian Company Announcements” for Sirca academic members and subscribers.

Sirca has developed software which exposes all available reference data for each disclosure, along with all the free text terms which appear in the bodies of the disclosures. Whilst the database is not as manually indexed as many of the offerings available from commercial providers it is nevertheless a very impressive resource in terms of enabling access to raw data.

As an example of how the database can be used, we were recently asked whether we could help with some research which was attempting to nail down the definitive listing of gold sector IPOs on the ASX for the last 5 years, along with a statement about the amount of capital raised.

Each company seeking access to the ASX needs to complete/submit a range of documentation and have these filed through the ASX ComNews service, these include prospectuses and information memoranda, along with standard ASX documentation. The ASX also issues official documents which indicate that an entity has been admitted to their official list. Using the way the ASX categorise these documents, combined with Sirca’s process of exposing the text of the disclosures it was possible to narrow down the overall list of IPOs to just those pertaining to the Gold sector. Furthermore the researcher was able to access the actual disclosures to cross check the amount of capital raised.