Talk:Indexed grammar

Mistake?[edit]

I'd just like to note that the grammar as written cannot be correct:

S -> aAfc
aAfc -> a(aAgc)fc
a(aAgc)fc -> a(a(aAgc)gc)fc
a(a(aAgc)gc)fc -> a(a(abBc)gc)fc

And now the derivation halts, because the only nonterminal B has no production - it's separated from its index by the intervening terminal c. --Peter Farago (talk) 06:33, 14 March 2009 (UTC)[reply]

The description of indexed grammars is actually incomplete; this problem doesn't actually arise because the indexes aren't rewritten in that fashion. I'm working on fixing it now. --Augur (talk) 17:22, 17 February 2010 (UTC)[reply]

Questions on the example[edit]

I may not understand completely this formalism, but it appears to me that in the section "Example", the third rule should be:

$T[\sigma g]\to T[\sigma ]b$

instead of:

$T[\sigma g]\to T[\sigma ]a~|~T[\sigma ]b$

and the rightmost part of the proposed derivation also seems incorrect, it should be: $S[]\to S[f]\to S[fg]\to S[fgg]\to T[fgg]T[fgg]T[fgg]$

\to T[fg]bT[fgg]T[fgg]\to T[f]bbT[fgg]T[fgg]\to T[]abbT[fgg]T[fgg]

\to abbT[fgg]T[fgg]\to ...\to abbabbT[fgg]\to ...\to abbabbabb

instead of:

$S[]\to S[f]\to S[fg]\to S[fgg]\to T[fgg]T[fgg]T[fgg]$

\to T[fg]bT[fgg]T[fgg]\to T[f]bbT[fgg]T[fgg]\to T[]abbT[fgg]T[fgg]

\to abbT[fgg]T[fgg]\to ...\to abbabbT[]\to abbabbabb

—Preceding unsigned comment added by 131.254.15.97 (talk) 10:18, 6 July 2010 (UTC)[reply]

I think anonymous is correct -- the example as currently stated is incorrect.--128.143.67.138 (talk) 00:50, 25 December 2010 (UTC)[reply]

I agree. Fixed. Clément Pillias (talk) 23:21, 27 November 2011 (UTC)[reply]

Source does not correspond to citation[edit]

In section Linear indexed grammars, Gazdar's work "Applicability of Indexed Grammars to Natural Languages" is cited as proving "Membership in a linear indexed grammar can be decided in polynomial time." I just read quickly this paper (accessible throug Google Books [1]) and I have the impression this is not true. Mgalle (talk) 12:40, 14 September 2010 (UTC)[reply]

Definition?[edit]

I think a formal definition of IGs is needed, right after the introductory paragraph -- definitely before the first example. Could someone who is actually familiar with IGs provide one? UKoch (talk) 16:24, 26 December 2010 (UTC)[reply]

Deciding if a string is in an indexed grammar is NP-complete?[edit]

The reference for the sentence in the introduction to this article, "The problem of determining whether a string is recognized by an indexed grammar is NP-complete.", is Hopcroft and Ullman, Introduction to Automata Theory, Languages, and Computation, 1979. However, I am looking at the (very brief) section on indexed languages in that textbook, and no such statement is made. Therefore I am removing this statement from the article. J. Finkelstein (talk) 21:46, 19 January 2012 (UTC)[reply]

Needs cleaning-up[edit]

This article has become a jumble of sections ... it should be more clearly structured. Definition, examples, then linear indexed grammars, equivalences etc.

One thing was missing, though: the formal definition (from Aho's paper) has now been provided. Please double-check! [ɯ:] (talk) 16:29, 27 December 2012 (UTC)[reply]

Apparently, the definition uses different terminology than the examples; it should be adapted. In the examples, it should be made clear which production are "index productions" and which are ordinary ones. I cannot see how the 3rd rule in the first example, viz. "T[σf]→T[σ]a", fits in either scheme (index or ordinary production), as both allow only a simple nonterminal on the left-hand side, but not a stack "[σ]" of index symbols, let alone "[σf]". Maybe somebody could explain that. - Jochen Burghardt (talk) 20:18, 4 November 2013 (UTC)[reply]

Settings of Hopcroft+Ullman 1979 added for comparison purposes[edit]

The definition of Aho 1968 in the article differs (slightly?) from that in Hopcroft+Ullman 1979. I put the latter here (see below), in case an indexed-grammar expert might wish to include it in the article or use it in some other way. The aⁿbⁿcⁿ example in the article refers to Hopcroft+Ullman 1979, but the grammar's first rule isn't admitted by the definition below. Therefore, I also append the example from Hopcroft+Ullman 1979.

Definition by Hopcroft+Ullman 1979
Formally, an indexed grammar^[1] is a 5-tuple G = ⟨N,T,F,P,S⟩ where

N is a set of variables or nonterminal symbols,

T is a set ("alphabet") of terminal symbols,

F is a set of so-called indices,

S ∈ N is the start symbol, and

P is a finite set of productions of the form
A → α,

A → Bf, or

Af → α,

where A, B ∈ N are nonterminal symbols, f ∈ F is an index, and α ∈ (N ∪ T)^* is a string of nonterminal and terminal symbols.
Derivations are similar to those in a context-free grammar except that a nonterminal symbol may be followed by a string ("stack") of indices. Terminal symbols may not be followed by indices. When a production like e.g. A → BC is applied, the index stack of A is attached to both B and C.
Formally, the relation ⇒ ("direct derivation") is defined on the set (NF^*∪T)^* of "sentential forms" as follows:

If A → X₁ ... X_n is a production of type 1, then β Aφ γ ⇒ β X₁φ₁ ... X_nφ_n γ, where φ_i is φ if X_i ∈ N is a nonterminal, and is the empty string ε if X_i ∈ T is a terminal symbol. That is, a copy of the rule's left hand side's index stack φ is attached to each nonterminal of the right hand side.

If A → Bf is a production of type 2, then β Aφ γ ⇒ β Bf φ γ. That is, the right hand side's index stack is obtained from the left hand side's stack by pushing f onto it.

If Af → X₁ ... X_n is a production of type 3, then β Afφ γ ⇒ β X₁φ₁ ... X_nφ_n γ, where again φ_i is φ if X_i ∈ N is a nonterminal, and ϵ if X_i ∈ T is a terminal symbol. That is, the first index is popped from the left hand side's stack, which is then distributed to each nonterminal of the right hand side.

As usual, the derivation relation ⇒^* is defined as the reflexive transitive closure of direct derivation ⇒. The language L(G) = { w ∈ T^*: S ⇒^* w } is the set of all strings of terminal symbols derivable from the start symbol.
Example by Hopcroft+Ullman 1979
The grammar G = ⟨ {S,T,A,B,C}, {a,b,c}, {f,g}, P, S ⟩ produces the language { aⁿbⁿcⁿ: n ≥ 1 }, where the production set P consists of

S → Tg Af → aA Ag → a

T → Tf Bf → bB Bg → b

T → ABC Cf → cC Cg → c

That language is known to be not context-free. An example derivation is S ⇒ Tg ⇒ Tfg ⇒ Afg Bfg Cfg ⇒ aAg Bfg Cfg ⇒ aAg bBg Cfg ⇒ aAg bBg cCg ⇒ aa bBg cCg ⇒ aa bb cCg ⇒ aa bb cc.

^ Introduction to Automata Theory, Languages, and Computation. Addison-Wesley. 1979. ISBN 0-201-02988-X. {{cite book}}: Cite uses deprecated parameter |authors= (help); here: Sect.14.3, p.389-390. This section is omitted in the 2nd edition 2003.

Jochen Burghardt (talk) 14:40, 5 January 2014 (UTC)[reply]

The current definition is confusing, in any case. It states that the rules are of the form

A[\sigma ]\to \cdots

, with

\sigma

a sequence, but the value of

\sigma

is not used in defining derivation. I'd very much prefer if it were written as in H&U, or with

\cdots

instead of

\sigma

. 158.37.5.111 (talk) 13:33, 1 February 2023 (UTC)[reply]

The stack sequence

\sigma

is used, and manipulated, during derivation. The current stack value restricts the set of applicable productions. See the examples (which are written in H+U style). - Jochen Burghardt (talk) 14:51, 1 February 2023 (UTC)[reply]

Linear Indexed Grammars[edit]

My impression is that the definition of linear indexed grammars predates the work of Gazdar, although he may first have made the connection with mildly context-sensitive languages.

There is an article "Linear Indexed Languages" by Duske and Parchmann, Theoretical Computer Science 32 (1984), 47-60. They prove the fact that the class of languages L_1 introduced by N.A. Khabbaz in "A Geometric Hierarchy of Languages", J. Comput. Syst. Sci 8 (1974), 142-157 coincides precisely with what they call linear indexed languages: these are obtained by "controlling" a linear context-free grammar with a context-free language. Here a context-free grammar is linear, if the right side of every production contains at most one variable.

What baffles me in the current Wikipedia article is the formulation "by requiring that AT MOST ONE nonterminal in each production be specified as receiving the stack, whereas in a normal indexed grammar, ALL nonterminals receive copies of the stack" (emphasis mine) by which linear indexed grammars are distinguished from ordinary indexed grammars. I would have expected that there is at most one variable present, which therefore has to receive the stack. But perhaps Gazdar defined a different notion of linear indexed grammar, where one can chose among several variables, where to put the stack? Unfortunately, I presently don't have acccess to Gazdar's paper. — Preceding unsigned comment added by 92.76.140.60 (talk • contribs) 4 January 2014‎

I was lucky to look at p.71 of Gazdar's 1988 paper at Google books, where he says:

"nonterminal symbols are indicated by upper case letters (A, B, C), (...) possibly empty strings of terminals and nonterminals by W, W₁, W₂, etc., indices by lower case italics letters (i, j, k), and stacks of indices by square brackets and periods ([], [..], [i,..]), where [i,..] is a stack whose topmost index is i, [] is an empty, and [..] is a possibly empty stack of indices. (...) In the standard formulation, an indexed grammar can contain rules of three different sorts:

A[..] → W[..]
A[..] → B[i,..]
A[i,..] → W[..]

I shall refer to rules that have one or other of these three forms as H&U rules. The first type of rule simply copies the stack to all nonterminal daughters. The second type of rule pushes a new index onto the stack handed down to its unique nonterminal daughter. And the third type of rule pops an index off the stack and distributes what is left to its nonterminal daughters. (...) A compound symbol of the form A[..] means that the nonterminal A bears the stack [..]. A compound symbol of the form W[..] stands for a string of terminal and/or nonterminal symbols each nonterminal symbol of which bears the stack [..]. Terminal symbols cannot bear stacks."

"(...)" denotes an omission I made, while "[..]" is quoted from the paper. "H&U" refers to Hopcroft and Ullman (1979), in contrast to Aho (1968). Gazdar doesn't mention linear indexed grammars in his introduction, which I could read completely. Does that help? - Jochen Burghardt (talk) 20:38, 4 January 2014 (UTC)[reply]

The LIG of Gazdar and Vijay-Shanker is not the same notion as the same-named one from Duske and Parchmann. LIGs in the former sense are much more simply formed. Quoting from Kallmeyer's book: "An indexed grammar is called a linear indexed grammar (LIG) (Gazdar,1988; Vijay-Shanker, 1987) if in a production A → α or Af → α a the stack of A is copied only to one non-terminal in α." That's all there is to Gazdar/Vijay-Shanker's LIG. JMP EAX (talk) 12:27, 17 August 2014 (UTC)[reply]

Farily WP:RANDYesque citation request[edit]

“

which belongs to the [[mildly context-sensitive language|mildly context-sensitive]] classes. Membership in a linear indexed language can be decided in polynomial time.{{citation needed|reason=I couldn't find this property stated in Gazdar's paper. (However, I'm unable to access its pages 73,74,80,81,87,88,94.)|date=February 2014}}

”

The definition of mildly context-sensitive language contains the requirement that it must parseable in PTIME. JMP EAX (talk) 12:08, 17 August 2014 (UTC)[reply]

As far as I remember, Gazdar 1988 didn't speak about mildly context-sensitivity either (on my accessible pages), so both statements would need a citation. On second thought, maybe they both can be established from weak equivalence to TAGs, but that didn't come to my mind in Feb.2014.

BTW: As far as I understood, "WP:RANDYesque" means something like "being a non-expert and proud of it"; if that is not too wrong, I would appreciate if you stop to attribute that to me; I try to be neither a non-expert nor proud. - Jochen Burghardt (talk) 13:02, 17 August 2014 (UTC)[reply]

External links modified[edit]

Hello fellow Wikipedians,

I have just added archive links to one external link on Indexed grammar. Please take a moment to review my edit. If necessary, add {{cbignore}} after the link to keep me from modifying it. Alternatively, you can add {{nobots|deny=InternetArchiveBot}} to keep me off the page altogether. I made the following changes:

Added archive https://web.archive.org/20070311042935/http://www.cogs.susx.ac.uk:80/research/nlp/gazdar/nlp-in-prolog/ch04/chapter-04-sh-1.6.3.html to http://www.cogs.susx.ac.uk/research/nlp/gazdar/nlp-in-prolog/ch04/chapter-04-sh-1.6.3.html#sh-1.6.3

When you have finished reviewing my changes, please set the checked parameter below to true to let others know.

This message was posted before February 2018. After February 2018, "External links modified" talk page sections are no longer generated or monitored by InternetArchiveBot. No special action is required regarding these talk page notices, other than regular verification using the archive tool instructions below. Editors have permission to delete these "External links modified" talk page sections if they want to de-clutter talk pages, but see the RfC before doing mass systematic removals. This message is updated dynamically through the template {{source check}} (last update: 18 January 2022).

If you have discovered URLs which were erroneously considered dead by the bot, you can report them with this tool.
If you found an error with any archives or the URLs themselves, you can fix them with this tool.

Cheers.—^{cyberbot II}_{Talk to my owner:Online} 05:38, 23 February 2016 (UTC)[reply]

External links modified[edit]

Hello fellow Wikipedians,

I have just modified one external link on Indexed grammar. Please take a moment to review my edit. If you have any questions, or need the bot to ignore the links, or the page altogether, please visit this simple FaQ for additional information. I made the following changes:

Added archive https://web.archive.org/web/20060719224223/http://acl.ldc.upenn.edu/E/E93/E93-1042.pdf to http://acl.ldc.upenn.edu/E/E93/E93-1042.pdf

When you have finished reviewing my changes, you may follow the instructions on the template below to fix any issues with the URLs.

This message was posted before February 2018. After February 2018, "External links modified" talk page sections are no longer generated or monitored by InternetArchiveBot. No special action is required regarding these talk page notices, other than regular verification using the archive tool instructions below. Editors have permission to delete these "External links modified" talk page sections if they want to de-clutter talk pages, but see the RfC before doing mass systematic removals. This message is updated dynamically through the template {{source check}} (last update: 18 January 2022).

If you have discovered URLs which were erroneously considered dead by the bot, you can report them with this tool.
If you found an error with any archives or the URLs themselves, you can fix them with this tool.

Cheers.—InternetArchiveBot (Report bug) 00:37, 13 November 2017 (UTC)[reply]

[1] Introduction to Automata Theory, Languages, and Computation. Addison-Wesley. 1979. ISBN 0-201-02988-X. {{cite book}}: Cite uses deprecated parameter |authors= (help); here: Sect.14.3, p.389-390. This section is omitted in the 2nd edition 2003.

[1]