Preferred embodiments of the present invention comprise methods and software for processing text documents and extracting chemical data therein contained.

Bezeichnungen, chemische Formeln und Strukturdiagramme sind die Sprache der Chemie.

Names, chemical formulas and structure diagrams are the language of chemistry. A need for nomenclature arises when chemists must provide the information about connections in speech and writing, in the latter case usually when schhreiner unambiguous and unique structural diagram for some reason is not suitable or can not be used.

The nomenclature arbeitsablaufplsn to describe chemical structures is a language and can, therefore, when it is translated into another representation, are processed using linguistic method Der menschliche geistige Prozess, von einer chemischen Bezeichnung auf die Struktur zu kommen, scheint eine regelbasierte linguistische Methode zu sein.

The human mental process to come from a chemical name of the structure seems to be a rule-based linguistic method. As in linguistics there is a conflict between pragmatists who consider each term to be satisfactory, conveys the intended meaning, and the purists who insist that rules must be respected although the upper hand, unfortunately for the computer the pragmatists.

Typically, this includes the display on the computer for local computing as well as for distributed computing in intranets and the Internet mainly web-based.

17+ bewerbungsschreiben tischler

For purposes of clarity in the selection of preferred names, the two most important manufacturers and distributors have of chemical information Chemical Abstract Service http: These rules were necessary because the IUPAC recommendations frequently allow more than one name for a particular chemical compound.

In addition, trivial and trade names have shorter and more concise, successfully replaced systematic names for a number of compounds which are of commercial importance or the subject of public interest are e. The IUPAC recommendations were deliberately formulated so that they can be considerable freedom in their application, and in many cases they are not fully defined to its logical end result.

In der Praxis bedeutet dies, dass jede vorgegebene Struktur nicht notwendigerweise auf eine eindeutige korrekte Bezeichnung verweist. In practice, this means that any given structure does not necessarily refer to a unique correct name.

Therefore, the specific “dialects”, supported by CAS and Beilstein can still represent a systematic nomenclature, no matter how far they are apart. This is the biggest weakness of the nomenclature, provided that the use of computers is affected. This has impeded a solution to the difficulties in creating a unambiguous nomenclature standards.

As long as such a standard does not exist, the chemist is there strange in practice to a high degree of systematic nomenclature. But even if some kind of consensus reached and worked out a unambiguous nomenclature standard and it is believed there is still the problem of nomenclature complexity.

It is generally agreed that the IUPAC nomenclature is cumbersome and has a very large number of rules that are often very difficult to follow.

Frequently permissible alternatives when designation assignment, contradictory recommendations, the lack of rules in certain areas and the exaggerated freedom in interpreting the rules lead to ambiguity and specific nomenclature chaos.

A fundamental problem in naming that a correct term is not necessarily the only correct term for a structure. To complicate the matter further, the rules to arrive at a correct name, as explained above, complex, and very few chemists can handle them. What’s worse is that the major global centers for chemical documentation does not proceed either internally or externally consistent in their application of the rules. Die in In the 5 5 gezeigte Struktur veranschaulicht das Problem.

Structure as shown illustrates the problem. In principle there is nothing wrong a variety of names for structures. As long as each name is an appropriate representation of the structure, there are few real problems, except that it is ensured that chemists are reasonably familiar in passive sense with the rules to interpret a term that, contrary to create it.

The conventional attempted use of nomenclature, however, was much larger in scope. Vor der Computerisierung bestand der Idealfall darin, jede wichtige strukturelle Untereinheit der Struktur unter Verwendung der Nomenklatur zu indizieren. Before computerization, the ideal case was to index every major structural unit of the structure using the nomenclature.


Dieses Verfahren basiert auf chemischen Erfahrungswerten und ist keinesfalls schlecht. This method is based on chemical experience and is by no means bad. But it contains the limits of its own applicability insofar as the vocabulary used has never been fully standardized in a strictly defined sense and the intuitive subdivision has never been completely freed of internal contradictions.

Dies bedeutet, dass die Verwendung von Indices, die auf Bezeichnungen oder Teilen von Bezeichnungen basieren, bis heute ein gewagtes Unterfangen ist.

F. Bender Handels Ag – Brüttisellen (Bezirk Uster), Fabrikweg 2

This means that the use of indices that are based on names or parts of names, until today is a risky undertaking. To use the example above, it is not immediately apparent to most chemists, whether they are looking for for benzene or E for ethane to A acetaldehydeB. Such designations could then be reversed and unambiguously translated back into the same structural diagram. This is unfortunately not the case. As discussed above, replaced trivial or trade names that were shorter and more concise, successfully systematic names for a number of chemical compounds which are of commercial importance or the subject of public interest.

A comprehensive computer program is designed so that it can work with real chemical nomenclature, must be able to convert semi systematic, asystematische, outdated, ambiguous or otherwise “corrupted” names that are the reality of the current communication in chemistry.

The translation of chemical names into structures can be generally treated as a problem of computerized syntactic and semantic analysis of nomenclature as an artificial language. Um eine derartige Analyse zu erhalten, muss zuerst eine formale Grammatik der Nomenklatur aus informellen Regeln abgeleitet werden.

In order to obtain such an analysis, a formal grammar of nomenclature from informal rules must be derived first. Vom linguistischen Standpunkt aus ist es eine interessante Beobachtung, dass die grundlegende Sprache aller Benennungssysteme in der organischen Chemie im Wesentlichen die gleiche ist.

From the linguistic point of view it is an interesting observation that the basic language of all naming systems in organic chemistry is essentially the same. Obwohl zwei Chemiker die gleiche Verbindung unterschiedlich bezeichnen, sind beide in der Lage, das gleiche strukturelle Diagramm zu zeichnen. While two chemists refer to the same compound differently, both will be able to draw the same structural diagram. In diesem Sinne entspricht die oben genannte Verwendung von unterschiedlichen Benennungspraktiken eher dem Problem der Bearbeitung von Dialekten als der Behandlung von getrennten und verschiedenen Sprachen.

In this sense, the use of different naming practices mentioned above corresponds more to the problem of processing of dialects as the treatment of separate and different languages. The knowledge of formal grammar of the chemical linguistic requires the creation of a dictionary of fragments so called morphemes from which the names can be built, and the explanation of appropriate syntax rules that define this structure. Beispielsweise kann eine Regel gleichzeitig die Fragmente “meth”, “eth”, “prop” usw.

For example, a rule can be considered so the fragments “meth”, “eth”, “prop” in the same context the same time. The morphemes must then be localized in a provided name and recognized. The process comprises a first analyzing the designation by being broken down into longest possible text fragments and then submitting the fragments to lexical analysis in order to identify the fragments according to a set of syntax rules, using the pre-defined dictionary.

Considering the numerous semi systematic fragments stored by IUPAC eg acetic acid instead of systematic acetic acida only functioning analysis algorithm with an extremely large dictionary of morphemes have to work. Once a valid name, the problem of allowed valid names was already mentioned abovehas been successfully analyzed, appropriate routines must be called to process the semantic information as each syntax rule was followed.

The localized in the label morphemes are then combined with corresponding structural fragments stored in a compact form as small connection tables. These are then combined and together in the final complete connection table CT arranged corresponding to the full name. Graphics routines transform the connection tables into structural diagrams around and make it as output to data terminals or in print 10 ready.

Transformations of the type described above can look back on a long tradition. The first use of a computerized grammar analysis process with a very limited dictionary of nomenclature terms in comparison with the wide range of designs that are allowed in the IUPAC nomenclature was made by Elliot.


At about the same time reported Stilwell 13 and later Cooke-Fox et al 14 on a very interesting grammar-based nomenclature translation for steroid nomenclature. Developed so far most research on grammar based translation of IUPAC nomenclature into structural diagrams was conducted by the team at the University of Hull 2,14, Apart from internal Beilstein memos and technical documents, there is no verified publications to which they could relate.

The format of the input chemical name, which was accepted by VICA written in Pascal or Fortran programming languagewas strictly defined for the syntax of the systematic nomenclature as used in the “Beilstein dialect” specific boundary drawing, specific treatment an appended suffixes such as esters or amides, specific syntax of multicomponent structures, etc. Another interesting attempt in the area of algorithmic designation conversion is Roxy, a system was developed in by Lawson 18 and programmed.

Recently came a few interesting practical and commercially available Computer systems that translate nomenclature in connection tables on the market. Its last version is included in the structure editing package ChemDraw Ultra and the chemical office suite ChemOffice Ultra. Des Weiteren kann das System Bezeichnungen von Polymeren und diejenigen von anorganischen Koordinations-Komplexen nicht verarbeiten. Furthermore, the system can not handle names of polymers and those of inorganic coordination complexes.

And for subtractive nomenclature de- DES, etc. The paper by Brecher contains a detailed description and classification of problems, strikes everyone who is trying to develop an automatic nomenclature converter. Diese Probleme — laut Brecher — ergeben sich vorwiegend aus der Vieldeutigkeit der derzeitigen Nomenklatur-Praktiken. These problems – according to Brecher – arise mainly from the ambiguity of current nomenclature practices.

This program is on the authority of ACD Labs 21 able to generate chemical structures for names of most classes of general organic compounds, many derivatives of more than basic parent natural product structures and semi systematic and trivial names of general organic compounds.

Die Batch-Version des Bezeichnungs-Umwandlers von ACD Labs “Name to Structure Batch” generiert Strukturen aus systematischen und nicht-systematischen chemischen Bezeichnungen von allgemeinen organischen, einigen biochemischen und einigen anorganischen Verbindungen. The batch version of the designation converter from ACD Labs “Name to Structure Batch” generates structures from systematic and non-systematic chemical names of general organic, some biochemical and some inorganic compounds.

The inputs for this program can. The program is also available for UNIX platforms. This is particularly important because most of the intranet systems are driven small chemical databases on UNIX minicomputers. Another converter of names are in structures derived from ChemInnovation Software, Inc. The program is called Name Expert. The program is more academic than practical nature which is mainly due to an unacceptable low success rate. Stenogramm, Kekule oder halbstrukturelle Formel.

It can also add tags to corresponding atoms and groups. The latest version now supports limited stereochemistry and includes drug names and structures. It can be considered a nomenclature tool for practical use in a company under any circumstances.

The program is relatively effective for strictly systematic IUPAC names, but for the general nomenclature, as that can be found in the current literature, the program can no longer show as a single digit success rate. The chemical nomenclature, and in particular the organic nomenclature, which is published in the literature journals, patents, technical documentation, etc.

Die Nomenklatur, die heute als “systematisch” betrachtet wird, wird durch den Konsens der Ansichten der Anwender definiert. The nomenclature is considered today as “systematic” is defined by the consensus of the views of users.

Eine “korrekte Bezeichnung” existiert nicht. A “correct name” does not exist. There are “reasonable” naming practices, including those who are “dialects” are limited to the Beilstein or CAS.

Die bisherige Software zum Extrahieren von Informationen aus Text erzeugte oft nicht akzeptable Resultate hinsichtlich Genauigkeit und Umfang. The previous software to schreinwr information from text often produced unacceptable results in terms of accuracy and scope. Um Extraktionen mit akzeptabler Genauigkeit und akzeptablem Umfang zu erzeugen, wurde ein menschlicher Indexierer eingesetzt.