3.1: Some advanced topics in logic

Last updated
Save as PDF

Page ID: 16885

Craig DeLancey
SUNY Oswego via OpenSUNY

$ \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } $ $ \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} $$\newcommand{\id}{\mathrm{id}}$ $ \newcommand{\Span}{\mathrm{span}}$ $ \newcommand{\kernel}{\mathrm{null}\,}$ $ \newcommand{\range}{\mathrm{range}\,}$ $ \newcommand{\RealPart}{\mathrm{Re}}$ $ \newcommand{\ImaginaryPart}{\mathrm{Im}}$ $ \newcommand{\Argument}{\mathrm{Arg}}$ $ \newcommand{\norm}[1]{\| #1 \|}$ $ \newcommand{\inner}[2]{\langle #1, #2 \rangle}$ $ \newcommand{\Span}{\mathrm{span}}$ $\newcommand{\id}{\mathrm{id}}$ $ \newcommand{\Span}{\mathrm{span}}$ $ \newcommand{\kernel}{\mathrm{null}\,}$ $ \newcommand{\range}{\mathrm{range}\,}$ $ \newcommand{\RealPart}{\mathrm{Re}}$ $ \newcommand{\ImaginaryPart}{\mathrm{Im}}$ $ \newcommand{\Argument}{\mathrm{Arg}}$ $ \newcommand{\norm}[1]{\| #1 \|}$ $ \newcommand{\inner}[2]{\langle #1, #2 \rangle}$ $ \newcommand{\Span}{\mathrm{span}}$$\newcommand{\AA}{\unicode[.8,0]{x212B}}$

17. Some advanced topics in logic

17.1 What do we study in advanced logic?

Students may imagine that in more advanced logic we continue with first order logic, translating more complex sentences and using them in proofs. But, in advanced logic, we often turn toward quite different, and more significant, issues. This chapter is meant as a brief look forward. It will give you a sense of what various topics in advanced logic are like. This will hopefully encourage you to continue to study logic, and also, perhaps, this chapter can act as a bridge from what you have learned to what can come next.

This chapter will provide brief illustrations of the following topics:

17.2 Axiomatic propositional logic

17.3 Mathematical induction

17.4 The deduction theorem for propositional logic

17.5 Set theory

17.6 Axiomatic first order logic

17.7 Modal logic

17.8 Peano arithmetic

17.2 Axiomatic propositional logic

The kinds of logical systems we have been studying up to now are called “natural deduction systems”. In a natural deduction system, we add rules as we need them, and we make no attempt to specify the language in a compact form. As logic became much more sophisticated in the twentieth century, logicians and mathematicians began to study what logic could do. For example, are all logical truths provable? To answer a question like this, we must be much more precise about the nature of our logic. In particular, we need to boil it down to its essential elements. Then we can study what those essential elements are capable of doing.

We call this “the axiomatic approach”. It is ancient, and familiar to all of us who have studied some geometry and seen Euclid’s methods. But, in the last century, mathematicians developed a much more rigorous understanding of the axiomatic approach than is present in Euclid’s work. In this section, we will describe an axiom system for the propositional logic. I hope that you are pleasantly shocked to find how compact we can make the propositional logic.

We will use what are actually called “axiom schemas”. These are sentences in our metalanguage that describe the form of any number of other sentences. Any sentence that has the form of the axiom schema is an instance of the axiom. Our first three axiom schemas are:

(L1): (Φ → (Ψ → Φ))

(L2): ((Φ → (Ψ→χ)) → ((Φ → Ψ) → (Φ → χ)))

(L3): ((¬Φ→¬Ψ)→((¬Φ→Ψ)→Φ))

We suppose that we have some number of atomic sentences, P1, P2, P3 and so on (it is useful, at this point, to use a single letter, with subscripts, so we do not find ourselves running out of letters). We have one rule: modus ponens. We have one proof method: direct derivation. However, we have now a new principle in doing proofs: we can at any time assert any instance of an axiom. Thus, each line of our direct derivation will be either a premise, an instance of an axiom, or derived from earlier lines using modus ponens.

Later, we will loosen these restrictions on direct derivations in two ways. First, we will allow ourselves to assert theorems, just as we have done in the natural deduction system. Second, we will allow ourselves to apply principles that we have proven and that are general. These are metatheorems: theorems proved about our logic.

The semantics for this system are like those for the propositional logic: we assign to every atomic formula a truth value, and then the truth value of the sentences built using conditionals and negation are determined by the truth tables for those connectives.

Amazingly, this system can do everything that our propositional logic can do. Furthermore, because it is so small, we can prove things about this logic much more easily.

What about conjunction, disjunction, and the biconditional? We introduce these using definitions. Namely, we say, whenever we write “(Φ^Ψ)” we mean “¬(Φ→¬Ψ)”. Whenever we write “(ΦvΨ)” we mean “(¬Φ→Ψ)”. Whenever we write “(Φ ↔ Ψ)” we mean “((Φ→Ψ)^(Ψ→Φ))”; or, in full, we mean “¬((Φ→Ψ)→¬(Ψ→Φ))”.

We will introduce a useful bit of metalanguage at this point, called the “turnstile” and written “|―”. We will write “{Ψi, Ψj, …}|― Φ” as a way of saying that Φ is provable given the assumptions or premises {Ψi, Ψj, …}. (Here the lower case letter i and j are variables or indefinite names, depending on context—so “Ψi” means, depending on the context, either any sentence or some specific but unidentified sentence. This is handy in our metalanguage so that we do not have to keep introducing new Greek letters for metalanguage variables; it is useful in our object language if we want to specify arbitrary or unknown atomic sentences, Pi, Pj….) The brackets “{” and “}” are used to indicate a set; we will discuss sets in section 17.5, but for now just think of this is a collection of things (in this case, sentences). If there is nothing on the left of the turnstile—which we can write like this, “{}|― Φ” —then we know that Φ is a theorem.

Proving theorems using an axiomatic system is often more challenging than proving the same theorem in the natural deduction system. This is because you have fewer resources, and so it often seems to require some cleverness to prove a theorem. For this reason, we tend to continue to use the methods and rules of a natural deduction system when we aim to apply our logic; the primary benefit of an axiom system is to allow us to study our logic. This logical study of logic is sometimes called “metalogic”.

An example can be illustrative. It would be trivial in our natural deduction system to prove (P→P). One proof would be:

$\fitchprf{} { \subproof{\pline[1.]{P}[assumption for conditional derivation]}{ \pline[2.]{p}[repeat, 1] } \pline[3.]{(P \lif P)}[conditional derivation, 1-2] }$

To prove the equivalent in the axiom system is more difficult. We will prove (P1→P1). That is, we will prove {}|―(P1→P1). What we must do in this system is find instances of our axioms that we can use to show, via modus ponens, our conclusion. We will not be using the Fitch bars. The proof will begin with:

1. ((P1→((P1→ P1) →P1)) → ((P1→(P1→P1)) →(P1→P1))) instance of (L2)

To see how why we are permitted to assert this sentence, remember that we are allowed in this system to assert at any time either a premise or an axiom. We are trying to prove a theorem, and so we have no premises. Thus, each line will be either an instance of an axiom, or will be derived from earlier lines using modus ponens. How then is line 1 an instance of an axiom?

Recall that axiom (L2) is: ((Φ → (Ψ→χ)) → ((Φ → Ψ) → (Φ → χ)). We replace Φ with P1, and we get ((P1 → (Ψ→χ)) → ((P1 → Ψ) → (P1 → χ)). We replace Ψ with (P1→ P1) and we have ((P1 → ((P1→ P1) →χ)) → ((P1 → (P1→ P1)) → (P1 → χ)). Finally, we replace χ with P1, and we end up with the instance that is line 1.

Continuing the proof, we have:

1. ((P1→((P1→ P1) →P1)) →((P1→(P1→P1)) →(P1→P1)))	instance of (L2)
2. (P1→((P1→ P1) →P1))	instance of (L1)
3. ((P1→(P1→P1)) →(P1→P1))	modus ponens 1, 2
4. (P1→(P1→P1))	instance of (L1)
5. (P1→P1)	modus ponens 3, 4

That proof is a bit longer, and less intuitive, than our natural deduction proof. Nonetheless, it does illustrate that we can prove (P1→ P1) in this system.

We can recreate this proof at the level of our metalanguage, and show that (Φ→Φ). That would simply mean doing the same proof, where every P1 is replaced by Φ.

With the axiomatic propositional logic, it is not difficult to prove two important results about the logic. First, others have proven that this propositional logic is complete. Intuitively, a logical system is complete when all of the truths of that system are provable. In practice, we must carefully define for each particular logical system what completeness will mean. For the propositional logic, “completeness” means that all the tautologies are theorems. That is, every sentence that must be true is provable. This is in fact the case. Intuitively, a logical system is consistent if it is not possible to prove some sentence and also prove its denial. The propositional logic is also consistent.

Unfortunately, these proofs of completeness and consistency are a bit longer than we want to include in this chapter’s brief overview.^[13] However, to give a sense of how one goes about proving things about the axiomatic propositional logic, we can prove another result, the deduction theorem. It is helpful to quickly add to the axiomatic system additional results that simplify proofs. The most useful of these is the deduction theorem; but before we can prove the deduction theorem, we need to introduce a new proof method. We do this in the next section; and prove the deduction theorem in the section after that.

17.3 Mathematical induction

In mathematics, and in advanced logic, we often make use of a special form of reasoning called “mathematical induction”. (This name is unfortunate, since “induction” is also the name of a reasoning method that is not a deductive method. However, mathematical induction is a deductive method.) The idea of mathematical induction is very powerful, and proves to be one of the most useful methods for proving things about our logic. It can be used whenever the collection of things that we aim to prove something about is ordered in the right way. An example of a set that is ordered the right way is the natural numbers, where there is a least number (1), and each number after 1 has a unique predecessor.

Here is the method. Suppose we want to prove that a property Φ is had by every member of some collection of objects that can be ordered in the right way.

First, we prove the base case. This means that we find the least element(s) and prove that it has (or they have) the property in question.

Second, we make the induction hypothesis. We assume that some arbitrary object in our collection—the nth objection in our ordered collection—has the property Φ.

Third, we prove the induction step. We show that if the arbitrary nth object in our ordering of the objects in our domain has property Φ, then the next object (the n+1th object) in our ordering of the objects in our domain has property Φ.

We are then done: we conclude that every object in our domain has the property Φ.

An alternative way of formatting a proof using mathematical induction uses a different formulation of the induction hypothesis. We pick an arbitrary object in our ordering, the nth item. Then, we assume that every object in our ordering before the nth item has the property Φ. For the induction step, we prove that if each object in the ordering before the nth object has the property Φ, then the nth item has the property Φ. (This is sometimes called “strong mathematical induction”, and then the method above is called “weak mathematical induction”. Some logicians insist that the two methods are identical and so they do not appreciate the difference in terms.)

The reasoning here is hopefully intuitive. We show the first item has the property, and that if any object has the property then the next object does. In that case, every object must have the property, because, like falling dominoes, we know the first element has the property, but then the second element does, but then the third element does, and so on.

It will be helpful to give an example of mathematical induction that backs away from our logic and assumes some very basic mathematics, since many people are familiar with arithmetic from years of schooling. Legend has it that as a young child the mathematician Carl Friedrich Gauss (1777-1855) was instructed by a teacher—seeking to occupy the boy in order to shut him up for a while—to add up all the numbers to 100. Gauss very quickly gave the teacher the answer. The legend is that he promptly saw a pattern in adding a sequence of numbers, and derived the following result: 1 + 2 + 3 + … + n = (n2+n) /2. Let’s prove this using mathematical induction.

First, we prove the base case. The sum of 1 is 1. This we see from our own reasoning. The equation is (n2+n) /2, and so we have (12+1) /2 = (1+1) /2, and (1+1)/2 = 2/2, and 2/2 = 1. Thus, the equation works for the base case: it matches our independent reasoning about the base case.

Second, we make the induction hypothesis. We assume for some arbitrary n that the sum of numbers up to the nth number is (n2+n) /2.

Third, we prove the induction step. The next step after n is n+1. Since we assume in the induction hypothesis that the sum of numbers up to nth number is (n2+n) /2, the sum up to the n+1th number is: ((n2+n)/2) + (n+1).

So our task is to show that this is equivalent to the proposed theorem, for the n+1th case. Substitute in (n+1) for n in the equation (n2+n) /2. That gives us: ((n+1)2+ n+1) /2.

If we can show that ((n2+n)/2) + (n+1) = ((n+1)2+ n+1) /2, we will have proven the induction step. Note that the left hand side we got from our induction step plus a simple observation. The right hand side we got from applying the equation. Our question then is, are the results the same? That is, does the equation match what we showed to be true?

Consider the left side. We see:

((n 2 +n)/2) + (n+1) = ((n 2 +n)/2) + (2n+2)/2)

and

((n 2 +n)/2) + (2n+2)/2) = ((n 2 +n) + (2n+2))/2

and

((n 2 +n) + (2n+2))/2 = (n 2 + 3n + 2)/2

Consider now the right side. Here, we are applying the equation. We are hoping it will come out the same as our independent reasoning above. We see:

((n+1)2+ n+1) /2 = ((n2 + 2n + 1) + n + 1)/2

and

((n 2 + 2n + 1) + n + 1)/2 = (n 2 + 3n + 2) /2

So, since the two are identical (that is, our reasoning based on induction and basic observations matched what the equation provides), we have proven the equation applies in the n+1th case, and so we have proven the induction step.

Using mathematical induction we now conclude that the sum of numbers up to the nth number, for any n, is always (n2+n)/2.

17.4 The deduction theorem

How can mathematical induction be useful for something like studying our logic? Here is a practical case. In chapter 6, we introduced conditional derivations. We reasoned that the proof method looked obviously correct, and then we continued using it from that point on. Instead, using mathematical induction, we will now prove that conditional derivation works. We will do this by proving a theorem about our logic: the deduction theorem.

The deduction theorem states that, in our propositional logic, if with some premises, including Φ, we can prove Ψ, then (Φ→Ψ). This is precisely what we want from conditional derivation. We are saying that if we have Φ as a premise, and we are then able to prove Ψ, then we can assert the conditional (Φ→Ψ).

To prove this, we will use mathematical induction. At this point, students will no doubt notice a significant change in our approach to logic. As logic becomes more advanced, we tend to more frequently perform proofs that are stated partly in English, and that are given in what seems a more informal manner. The proofs are in fact not less formal, but they are being done in the metalanguage, supplemented with our natural language. This may at first seem strange, because by working in an axiom system we are also making some aspects of our system more formal. But the axiomatic approach is there to demonstrate certain features of our language. Having convinced ourselves of the basics, we will allow ourselves to leap ahead often now in our explanation of proofs.

Let S be a set of sentences of our logical language. S can be the empty set. These are (all but one of) our premises. We want to single out a particular premise, call it Φ. The deduction theorem says that if S ∪ {Φ}|― Ψ then S |― (Φ→Ψ). The expression “S ∪{Φ}” just means the set that collects together both all the things in S and also all the things in {Φ}. Or, put informally: it means take the set S and add to it also the sentence Φ.

What will be our ordered list of things upon which we perform mathematical induction? A proof, recall, is an ordered list of sentences, the last of which is our conclusion. So we can do induction upon a proof. We let Φ1, Φ2, Φ3…Φn be the list of sentences in our proof. Since this is a proof of Ψ and Φn is our last step in the proof, Φn is our conclusion Ψ. We prove the theorem by considering an arbitrary proof of Ψ that assumes S ∪{Φ}.

First, we consider the first line of our proof. Φ1 must be either (a) an axiom, (b) one of the sentences in the set of premises S, (c) or it must be Φ. Axiom (L1) tells us that (Φ → (Ψ → Φ)), and an instance of this is (Φ1 → (Φ → Φ1)). Thus, with this instance of (L1), and modus ponens, we could derive (Φ → Φ1). This tells us that if either (a) Φ1 is an axiom or (b) it is a premise, we can prove (Φ → Φ1). In case (c), where the first line of the proof is Φ, we can provide a proof like the one above (in section 17.2) to show that (Φ → Φ). Thus, in each case, if we can have Φ1 as a first step of our proof, we know that we could also prove that (Φ → Φ1). The deduction theorem is true of our base case (the first step of the proof).

Now we make the inductive assumption. Let j be any step in our proof. We assume that for any sentence in our ordering before step j, the deduction theorem applies. That is, for any i<j, S |― (Φ→Φi).

We now prove that for the jth step of our proof, the deduction theorem applies. The next step in the proof is line Φj. This step j is either (a) an axiom, (b) one of the sentences in the set of premises S, (c) it is Φ, or (d) it follows from earlier steps in the proof Φg and Φh with modus ponens. In handling the base case, we have already shown how cases (a), (b), and (c) can be used to show that the deduction theorem is true of these cases. The same arguments would apply for step j of the proof. We need only consider the additional case of (d).

We consider then the case where Φj follows from earlier steps in the proof Φg and Φh. That means that one of these (either Φg or Φh) is a conditional, and the other is the antecedent to the conditional. Our induction hypothesis says that for any Φi, where i<j, S |― (Φ→Φi). So, that means for g and h, because g<j and h<j, (Φ→Φg) and also (Φ → Φh). We have noted that one of these formulas must be a conditional; suppose the conditional is Φh. Thus, Φh has the form (Φg→Φj). So we have by the induction hypothesis that (Φ→Φg) and also (Φ → (Φg→Φj)). But we have as an instance of axiom (L2) that ((Φ → (Φg→Φj)) → ((Φ→Φg) → (Φ→Φj))). Repeated applications of modus ponens would give us (Φ→Φj).

This applies to any arbitrary step of our proof, and so it applies when j=n. That is, it applies to the last step of the proof. We have shown that (Φ→Φn). Since this was a proof of Ψ, Φn is Ψ. We have thus shown that if S ∪{Φ}|― Ψ then S |― (Φ→Ψ).

This proof was made simple by the fact that we had very few cases to consider: each line of the proof is either a premise, an axiom, or derived from earlier lines using a single rule, modus ponens. By reducing our system to its minimal elements, an axiom system allows us to prove results like these much more briefly than would be possible if we had many more cases to consider.

Although it adds no power to the system (anything that can be proved using the deduction theorem can be proved in the system without it), the deduction theorem makes many proofs smaller. For example, we can quickly prove other principles, such as the kind of inference traditionally called the “chain rule”. We can call this the “chain rule theorem”.

1. (P1→P2)	assumption
2. (P2→P3)	assumption
3. P1	assumption
4. P2	modus ponens 1, 3
5. P3	modus ponens 2, 4

Which is sufficient to show that {(P1→P2), (P2→P3), P1} |― P3, and, therefore, by the deduction theorem {(P1→P2), (P2→P3)} |― (P1→P3). Using the axioms alone, this proof would require many steps.

17.5 Set theory

One of the most useful tools of logic and mathematics is set theory. Using very simple principles, set theory allows us to construct some of the most complex elements of logic and mathematics. This allows for a pleasing elegance: it seems we can understand and construct many phenomena from very intuitive and basic assumptions.

Set theory was first developed by the mathematician Georg Cantor (1845-1918). The fundamental idea is that of a set. A set is a collection of things, and the identity of a set is determined by its elements. A set is not identical to its elements, however. There is, for example, an empty set: a set that contains nothing. It is however something: it is the set with no members.

With the notion of a set, some basic ideas about set formation, and our first order logic, we have a powerful theory that is useful throughout logic, and allows us to do additional things such as construct the numbers and study infinity. In this section, we can review some concepts of basic intuitive set theory, which we will call “natural set theory”.

The idea of a set is that it is a collection of things. Once we pick our domain of discourse, we can describe collections of things from that domain of discourse as sets. We write these sets using braces. So, if our domain of discourse were the natural numbers, the following would be a set:

{1, 2, 3}

A set is determined by its members (also sometimes called “elements”), but it is not the same thing as its members: 1 is not the same thing as {1}. And, we assume there is an empty set, the set of nothing. We can write this as {} or as ⌀. Sets can contain other sets: the following is a set containing three different members.

ParseError: invalid DekiScript (click for details)

Callstack:
    at (Bookshelves/Philosophy/A_Concise_Introduction_to_Logic_(DeLancey)/03:_A_Look_Forward/3.01:_Some_advanced_topics_in_logic), /content/body/div/div[2]/div[5]/p[7]/span/span, line 1, column 1

This is also an example of a pure set: it contains nothing but sets. If we develop our set theory with sets alone (if our domain of discourse is only sets) and without any other kinds of elements, we call it “pure set theory”.

The members of sets are not ordered. Thus

{1, 2, 3} = {3, 2, 1}

But an ordered set is a set in which the order does matter. We can indicate an ordered set using angle brackets, instead of curly brackets. Thus:

<1, 2, 3> = <1, 2, 3>

but

<1, 2, 3> ≠ <3, 2, 1>

We will write {…} for a set when we want to show its contents, and A, B, … for sets when we are dealing with them more generally. We write x∈A to mean that x is a member of the set A. As noted, sets can be members of sets.

Sets are defined by their contents, so two sets are the same set if they have the same contents.

A = B if and only if ∀x(x∈A ↔ x∈B)

This is interesting because it can be a definition of identity in set theory. In the natural deduction system first order logic, we needed to take identity as a primitive. Here, we have instead defined it using membership and our first order logic.

If all the contents of a set A are in another set B, we say A is a subset of B.

A ⊆ B if and only if ∀x(x∈A → x∈B)

A proper subset is a subset that is not identical (that means B has something not in A, in the following case):

A ⊂ B if and only if (A ⊆ B ^ A≠B)

The empty set is a subset of every set.

The power set operation gives us the set of all subsets of a set.

℘(A) = B if and only if ∀x(x ⊆ A → x∈B)

There are always 2n members in the power set of a set with n members. That is, if A has n members, then ℘(A) has 2n members.

The cardinal size of a set is determined by finding a one-to-one correspondence with the members of the set. Two sets have the same cardinal size (we say, they have the same cardinality) if there is some way to show there exists a one-to-one correspondence between all the members of one set and all the members of the other. For the cardinality of some set A, we can write

|A|

There is a one-to-one correspondence to be found between all the members of A and all the members of B, if and only if

|A| = |B|

If A ⊆ B then |A| ≤ |B|.

The union of two sets is a set that contains every member of either set.

A ∪ B is defined as satisfying ∀x((x∈A v x∈B) → x ∈ A ∪ B)

The intersection of two sets is a set that contains every member that is in both the sets.

A ∩ B is defined as satisfying ∀x((x∈A ^ x∈B) → x ∈ A ∩ B)

A shorthand way to describe a set is to write the following:

{ x | Φ(x)}

This is the set of all those things x such that x has property Φ. So for example if our domain of discourse were natural numbers, then the set of all numbers greater than 100 could be written: { x | x > 100 }.

A relation is a set of ordered sets of more than one element. For example, a binary relation meant to represent squaring might include {… <9, 81>, <10, 100> …}; a trinary relation meant to represent factors might include {… <9, 3, 3>, <10, 2, 5> …}; and so on.

One useful kind of relation is a product. The product of two sets is a set of all the ordered pairs taking a member from the first set and a member from the second.

A × B is defined as satisfying ∀x∀y((x∈A ^ y∈B) ↔ <x, y> ∈ A × B)

Many of us are familiar with the either of the Cartesian product, which forms the Cartesian plane. The x axis is the set of real numbers R, and the y axis is the set of real numbers R. The Cartesian product is the set of ordered pairs R × R. Each such pair we write in the form <x, y>. These form a plane, and we can identify any point in this plane using these “Cartesian coordinates”.

Another useful kind of relation is a function. A function f is a set that is a relation between the members of two sets. One set is called the “domain”, and the other is called the “range”. Suppose A is the domain and B is the range of a function, then (if we let a be a member of A and b and c be members of B, so by writing fab I mean that function f relates a from its domain to b in its range):

If f is a function from A into B, then if fab and fac then b=c

This captures the idea that for each “input” (item in its domain) the function has one “output” (a single corresponding item in its range). We also say a function f is

from a set A if its domain is a subset of A
on a set A if its domain is A
into a set B if its range is a subset of B
onto a set B if its range is B

If a function f is from A and into B, and also f is such that

If fab and fcb then a=c

then f is a 1-to-1 function from A and into B. The idea of being 1-to-1 is that f is a function that if reversed would be a function also. As we noted above, if there is a 1-to-1 function that is on A and onto B, then |A| = |B|.

If a function f is 1-to-1 on A and is into B, then we know that |B| ≥ |A|. (Such a function has every member of A in its domain, and for each such member picks out exactly one member of B; but because we only know that the function is into B, we do not know whether there are members of B that are not in the range of the function, and we cannot be sure that there is some other 1-to-1 function on A and onto B.)

A common notation also is to write f(a) for the function f with a from its domain. So, when we write this, we identify the expression with the element from the range that the function relates to a. That is, f(a) = b.

We can prove many interesting things using this natural set theory. For example, Cantor was able to offer us the following proof that for any set S, |℘(S)| > |S|. That is, the cardinality of the power set of a set is always greater than the cardinality of that set.

We prove the claim by indirect proof. We aim to show that |℘(S)| > |S|, so we assume for reductio that |℘(S)| ≤ |S|. We note that there is a function on S and into ℘(S); this is the function that takes each member of S as its domain, and assigns to that member the set of just that element in ℘(S). So, for example, if a∈S then {a}∈℘(S); and there is function on S and into ℘(S) that assigns a to {a}, b to {b}, and so on.

Since there is such a function, |℘(S)| ≥ |S|. But if |℘(S)| ≤ |S| and |℘(S)| ≥ |S|, then |℘(S)| = |S|. Therefore, there is a one-to-one function on S and onto ℘(S). Let f be one such function.

Consider that each object in S will be in the domain of f, and be related by f to a set in ℘(S). It follows that each element of S must be related to a set that either does, or does not, contain that element. In other words, for each s∈S, f(s) is some set A∈℘(S), and either s∈A or ¬s∈A. Consider now the set of all the objects in the domain of f (that is, all those objects in S) that f relates to a set in ℘(S) that does not contain that element. More formally, this means: consider the set C (for crazy) where C={ s | s∈S ^ ¬s ∈ f(s)}. This set must be in ℘(S), because every possible combination of the elements of S is in ℘(S)—including the empty set and including S. But now, what object in the domain of f is related to this set? We suppose that f is a 1-to-1 function on S and onto ℘(S), so some element of S must be related by f to C, if f exists. Call this element c; that is, suppose f(c)=C.

Is c∈C? If the answer is yes, then it cannot be that f(c)=C, since by definition C is all those elements of S that f relates to sets not containing those elements. So ¬c ∈C. But then, C should contain c, because C contains all those elements of S that are related by f to sets that do not contain that element. So c∈C and ¬c ∈C. We have a contradiction. We conclude that the source of the contradiction was the assumption for reductio, |℘(S)| ≤ |S|. Thus, for any set S, |℘(S)| > |S|.

This result is sometimes called “Cantor’s theorem”. It has interesting consequences, including that there cannot be a largest set: every set has a powerset that is larger than it. This includes even infinite sets; Cantor’s theorem shows us that there are different sized infinities, some larger than others; and, there is no largest infinity.

17.6 Axiomatic first order logic

In this section we briefly describe how the axiomatic approach can be taken toward the first order logic, and how a semantics can be constructed. For simplicity, we shall assume our example language does not include functions.

The basic elements of our language are connectives “→”, “¬”; predicates Fi(…), each with a specific arity; the quantifier ∀; and two kinds of symbolic terms: variables x1…xi… and names a1…ai….

Our syntax is similar to the syntax described in 13.2. Each predicate of arity n followed by n symbolic terms is well formed. If Φ and Ψ are well formed, then ¬Φ and (Φ→Ψ) are well formed. Every well-formed formula preceded by ∀xi is well formed.

A typical axiom system for first order logic takes the axioms of propositional logic and strengthens them with two additional axioms and an additional rule.

(L4) (∀xiΦ(xi) → Φ(α))

(L5) (∀xi(Φ → Ψ) → (Φ →∀xiΨ)) if xi is not free in Φ.

Where α is any symbolic term.

The additional rule is generalization. From |― Φ we can conclude |― ∀xiΦ(xi). This plays the role of universal derivation in our natural deduction system for the first order logic.

This compact system, only slightly larger than the axiomatic propositional logic, is as powerful as our natural deduction system first order logic.

This is a convenient place to describe, in a preliminary and general way, how we can conceive of a formal semantics for the first order logic. When we introduced the first order logic, we kept the semantics intuitive. We can now describe how we could start to develop a formal semantics for the language. The approach here is one first developed by Alfred Tarski (1901-1981).^[14] Tarski introduces a separate concept of satisfaction, which he then uses to define truth, but we will cover over those details just to illustrate the concept underlying a model.

The approach is to assign elements of our language to particular kinds of formal objects. We group these all together into a model. Thus, a model M is an ordered set that contains a number of things. First, it contains a set D of the things that our language is about—our domain of discourse. Second, the model includes our interpretation, I, which contains functions for the elements of our language. For each name, there is a function that relates the name to one object in our domain of discourse. For example, suppose our domain of discourse is natural numbers. Then a name a1 in our language might refer to 1. We say then that the “interpretation” of a1 is 1. The object in our interpretation that captures this idea is a function that includes < a1, 1>.

The interpretation of each predicate relates each predicate to a set of ordered sets of objects from our domain. Each predicate of arity n is related to a set of ordered sets of n things. We can write Dn for all the ordered sets of n elements from our domain. Then, each predicate of arity n has as an interpretation some (not necessarily all) of the relations in Dn. For example, an arity two predicate F1 might be meant to capture the sense of “… is less than or equal to…”. Then the interpretation of F1 is a function from F1 to a set of ordered pairs from our domain of discourse, including such examples as {… <1, 2>, <1, 1000>, <8, 10>, …} and so on.

We then say that a sentence like F1a2a3 is true if the interpretation for a2 and the interpretation for a3 are in the interpretation for F1. So, if a2 is 2, and a3 is 3, then F1a2a3 would be true because <2, 3> will be in the interpretation for F1.

We will need an interpretation for the quantifiers. Without going into the details of how we can describe this rigorously (Tarski uses a set of objects called “sequences”), the idea will be that if we could put any name into the bound variable’s position, and get a sentence that is true, the universally quantified statement is true. To return to our example: assuming again that our domain of discourse is natural numbers, so there is a least number 1, then the sentence ∀xjFa1xj would be true if F1a1xj would be true for any number we put in for xj. Suppose that the interpretation of a1 is 1. Then F1a1xj would be true no matter what element from our domain of discourse we took xj to be referring to, because 1 is less than or equal to every number is our domain of discourse. Therefore, ∀xjFa1xj would be true.

We need only add the usual interpretation for negation and the conditional and we have a semantics for the first order logic.

This is all very brief, but it explains the spirit of the semantics that is standard for contemporary logic. Much more can be said about formal semantics, and hopefully you will feel encouraged to study further.

17.7 Modal logic

Philosophers have always been interested in questions of time and possibility, and have often thought that these were intimately related. In his book, De Interpretatione (titles were given to Aristotle’s books by later editors—he wrote in Greek, and did not name his book with Latin titles), Aristotle wonders about the following. Consider a sentence like “There will be a sea battle tomorrow”. Suppose our prediction turns out true; is it necessarily true when it is made as a prediction? This seems absurd. That is, there might be a sea battle tomorrow, but it does not seem that there must be a sea battle tomorrow.

For there is nothing to prevent someone’s having said ten thousand years beforehand that this would be the case, and another’s having denied it; so that whichever the two was true to say then, will be the case of necessity. (18b35)^[15]

Here is Aristotle’s worry. Suppose tomorrow is June 16, 2014. Necessarily there either will or will not be a sea battle tomorrow (let us assume that we can define “sea battle” and “tomorrow” well enough that they are without vagueness or ambiguity). If there is a sea battle tomorrow, is the person who said today, or even ten thousand years ago, “There will be a sea battle on June 16, 2014” necessarily right? That is, if there is a sea battle tomorrow, is it necessary that there is a sea battle tomorrow?

Aristotle concludes that, if there is a sea battle tomorrow, it is not necessary now.

What is, necessarily is, when it is; and what is not, necessarily is not, when it is not. But not everything that is, necessarily is; and not everything that is not, necessarily is not…. It is necessary for there to be or not to be a sea-battle tomorrow; but it is not necessary for a sea-battle to take place tomorrow, nor for one not to take place—though it is necessary for one to take place or not to take place. (19b30)

Aristotle’s reasoning seems to be, from “necessarily (P v ¬P)”, we should not conclude: “necessarily P or necessarily ¬P”.

Philosophers would like to be able to get clear about these matters, so that we can study and ultimately understand necessity, possibility, and time. For this purpose, philosophers and logicians have developed modal logic. In this logic, we have most often distinguished possibility from time, and treated them as—at least potentially—independent. But that does not mean that we might not ultimately discover that they are essentially related.

In this section, we will describe propositional modal logic; it is also possible to combine modal logic with first order logic, to create what is sometimes called “quantified modal logic”. Our goal however is to reveal the highlights and basic ideas of modal logic, and that is easiest with the propositional logic as our starting point.

Thus, we assume our axiomatic propositional logic—so we have axioms (L1), (L2), (L3); modus ponens; and direct derivation. We also introduce a new element to the language, alone with atomic sentences, the conditional, and negation: “necessary”, which we write as “□”. The syntax of this operator is much like the syntax of negation: if Φ is a sentence, then

□Φ

is a sentence. We read this as saying, “necessarily Φ” or “it is necessary that Φ”. The semantics for necessity are a bit too complex to be treated fairly in this overview. However, one intuitive way to read “necessarily Φ” is to understand it as saying, in every possible world Φ is true. Or: in every possible way the world could be, Φ is true.

It is useful to introduce the concept of possibility. Fortunately, it appears reasonable to define possibility using necessity. We define “possible” to mean, not necessarily not. We use the symbol “◊” for possible.

Thus, if Φ is a sentence, we understand

◊Φ

to be a sentence. Namely, this is the sentence,

¬ □ ¬ Φ

We read “◊Φ” as saying, it is possible that Φ. One intuitive semantics is that it means in at least one possible world, Φ is true; or: in at least one possible way the world could be, Φ is true.

The difficult and interesting task before us is to ask, what axioms best capture the nature of necessity and possibility? This may seem an odd way to begin, but in fact the benefit to us will come from seeing the consequence of various assumptions that we can embody in different axioms. That is, our choice of axioms results in a commitment to understand necessity in a particular way, and we can discover then the consequences of those commitments. Hopefully, as we learn more about the nature of necessity and possibility, we will be able to commit to one of these axiomatizations; or we will be able to improve upon them.

All the standard axiomatizations of modal logic include the following additional rule, which we will call “necessitation”.

⌀ |― Φ

_____

⌀ |― □ Φ

This adds an element of our metalanguage to our rule, so let’s be clear about what it is saying. Remember that “⌀ |― Φ” says that one can prove Φ in this system, without premises (there is only an empty set of premises listed to the left of the turnstile). In other words, “⌀ |― Φ” asserts that Φ is a theorem. The necessitation rule thus says, if Φ is a theorem, then □Φ. The motivation for the rule is hopefully obvious: the theorems of our propositional logic are tautologies. Tautologies are sentences that must be true. And “must” in this description hopefully means at least as much as does “necessarily”. So, for our propositional logic, the theorems are all necessarily true.

Different axiomatizations of modal propositional logic have been proposed. We will review four here, and discuss the ideas that underlie them.

The most basic is known variously as “M” or “T”. It includes the following additional axioms:

(M1) ( □ Φ → Φ)

(M2) (□(Φ → Ψ) → (□Φ → □Ψ))

Both are intuitive. Axiom (M1) says that if Φ is necessary, then Φ. From a necessary claim we can derive that the claim is true. Consider an example: if necessarily 5 > 2, then 5 > 2.

Axiom (M2) says that if it is necessary that Φ implies Ψ, then if Φ is necessary, Ψ is necessary.

An extension of this system is to retain its two axioms, and add the following axiom:

(M3) (Φ → □ ◊Φ)

The resulting system is often called “Brouwer,” after the mathematician Luitzen Brouwer (1881-1966).

Axiom (M3) is more interesting, and perhaps you will find it controversial. It says that if Φ is true, then it is necessary that Φ is possible. What people often find peculiar is the idea that a possibility could be necessary. On the other hand, consider Aristotle’s example. Suppose that there is a sea battle today. Given that it actually is happening, it is possible that it is happening. And, furthermore, given that it is happening, is it not the case that it must be possible that it is happening? Such, at least, is one possible motive for axiom (M3).

More commonly adopted by philosophers are modal systems S4 or S5. These systems assume (M1) and (M2) (but not (M3)), and add one additional axiom. S4 adds the following axiom:

(M4) ( □ Φ → □□ Φ)

This tells us that if Φ is necessary, then it is necessary that Φ is necessary.

In S4, it is possible to prove the following theorems (that is, these are consequences of (M1), (M2), and (M4)):

( □ Φ ↔ □□ Φ)

(◊Φ ↔ ◊◊Φ)

The modal system S5 instead adds to M the following axiom:

(M5) (◊Φ → □ ◊Φ)

This axiom states that if something is possible, then it is necessary that it is possible. This is often referred to as the “S5 axiom”. It is perhaps the most controversial axiom of the standard modal logic systems. Note these interesting corollary theorems of S5:

( □ Φ ↔ ◊ □ Φ)

(◊Φ ↔ □ ◊Φ)

These modal logics are helpful in clarifying a number of matters. Let’s consider several examples, starting with questions about meaning in a natural language.

Many philosophers pursued modal logic in the hopes that it would help us understand and represent some features of meaning in a natural language. These include attempting to better capture the meaning of some forms of “if…then…” expressions in a natural language, and also the meaning of predicates and other elements of a natural language.

A problem with some “if…then…” expressions concerns that sometimes what appears to be a conditional is not well captured by the conditional in our propositional logic. In chapter 3, we had an example that included the sentence “If Miami is the capital of Kansas, then Miami is in Canada”. This sentence is troubling. Interpreting the conditional as we have done, this sentence is true. It is false that Miami is the capital of Kansas, and it is false that Miami is in Canada. A conditional with a false antecedent and a false consequent is true.

However, some of us find that unsatisfactory. We have recognized from the beginning of our study of logic that we were losing some, if not much, of the meaning of a natural language sentence in our translations. But we might want to capture the meaning that is being lost here, especially if it affects the truth value of the sentence. Some people want to say something like this: Kansas is in the United States, so if it were true that Miami were the capital of Kansas, then Miami would be in the United States, not in Canada. By this reasoning, this sentence should be false.

It seems that what we need is a modal notion to capture what is missing here. We want to say, if Miami were in Kansas, then it would be in the United States. Some philosophers thought we could do this by reading such sentences as implicitly including a claim about necessity. Let us fix the claim that Kansas is in the United States, and fix the claim that anything in the United States is not in Canada, but allow that Miami could be the capital of other states. In other words, let us suppose that it is necessary that Kansas is in the United States, and it is necessary that anything in the United States is not in Canada, but possible that Miami is the capital of Kansas. Then it appears we could understand the troubling sentence as saying,

Necessarily, if Miami is the capital of Kansas, then Miami is in Canada.

Assuming an implicit key, we will say,

□(P 1 →P 2 )

This seems to make some progress toward what we were after. We take the sentence “if Miami is the capital of Kansas, then Miami is in Canada“ to mean: in any world where Miami is in Kansas, then in that world Miami is in Canada. But, given the assumptions we have made above, this sentence would be false. There would be worlds where Miami is the capital of Kansas, and Miami is not in Canada. This at least seems to capture our intuition that the sentence is (on one reading) false. There are further subtleties concerning some uses of “if… then….” in English, and it is not clear that the analysis we just gave is sufficient, but one can see how we appear to need modal operators to better capture the meaning of some utterances of natural languages, even for some utterances that do not appear (at first reading) to include modal notions.

Another problem concerning meaning has to do with attempts to better capture the meaning of more fundamental elements of our language, such as our predicates. Our first order logics have what philosophers call an “extensional semantics”. This means that the meaning of terms is merely their referent, and the meaning of predicates is the sum of things that they are true of (see section 17.6). However, upon reflection, this seems inadequate to describe the meaning of terms and predicates in a natural language.

Consider predicates. The philosopher Rudolf Carnap (1891-1970) used the following example; suppose that human beings are the only rational animals (“rational” is rather hard to define, but if we have a strong enough definition—language using, capable of some mathematics, can reason about the future—this seems that it could be true, assuming by “animal” we mean Terrestrial metazoans).^[16] Then the following predicates would be true of all and only the same objects:

… is a rational animal.

… is a human.

That means the extension of these predicates would be the same. That is, if we corralled all and only the rational animals, we would find that we had corralled all and only the humans. If the meaning of these predicates were determined by their extensions, then they would have the same meaning. So the sentence:

All and only humans are rational animals.

would be sufficient, in an extensional semantics, to show that these predicates mean the same thing.

But obviously these predicates do not mean the same thing. How can we improve our logic to better capture the meaning of such natural language phrases? The philosopher Rudolf Carnap proposed that we use modal logic for this purpose. The idea is that we capture the difference using the necessity operator. Two sentences have the same meaning, for example, if necessarily they have the same truth value. That is, “… is a human” and “… is a rational animal” would have the same meaning if and only if they necessarily were true of all and only the same things.

We have not introduced a semantics for modal logic combined with our first order logic. However, the semantics we have discussed are sufficient to make sense of these ideas. Let us use the following key:

F1x: x is a rational animal.

F2x: x is a human.

Then we can translate “All and only rational animals are human” as:

∀x( F 1 x ↔ F 2 x )

And if the meaning of a predicate were merely its extension (or were fully determined by its extension) then these two predicates F1 and F₂ would mean the same thing. The proposal is that in order to describe the meanings of predicates in a natural language, we must look at possible extensions. We could then say that F1 and F2 have the same meaning if and only if the following is true:

□ ∀ x ( F 1 x ↔ F 2 x )

Much is going to turn on our semantics for these predicates and for the necessity operator, but the idea is clear. We see the difference in the meaning between “… is a human” and “… is a rational animal” by identifying possible differences in extensions. It is possible, for example, that some other kind of animal could be rational also. But then it would not be the case that necessarily these two predicates are true of the same things. In a world where, say, descendants of chimpanzees were also rational, there would be rational things that are not human.

The meaning of a predicate that is distinct from its extension is called its “intension”. We have just described one possible “intensional semantics” for predicates: the intension of the two predicates would be the same if and only if the predicates have the same extension in every world (or: in every way the world could be). This seems to get us much closer to the natural language meaning. It also seems to get to something deep about the nature of meaning: to understand a meaning, one must be able to apply it correctly in new and different kinds of situations. One does not know beforehand the extension of a predicate, we might argue; rather, one knows how to recognize things that satisfy that predicate—including things we may not know exist, or may even believe do not exist.

Many philosophers and others who are studying semantics now use modal logic as a standard tool to try to model linguistic meanings.

Modal logics have important uses outside of semantics. Here are two problems in metaphysics that it can help clarify. Metaphysics is that branch of philosophy that studies fundamental problems about the nature of reality, such as the nature of time, mind, or existence.

Consider Aristotle’s problem: is it the case that if there necessarily is or is not a sea battle tomorrow, then it is the case that necessarily there is, or necessarily there is not, a sea battle tomorrow? In each of the systems that we have described, the answer is no. No sentence of the following form is a theorem of any of the systems we have described:

( □( Φ v ¬ Φ) → ( □ Φ v □ ¬ Φ))

It would be a problem if we could derive instances of this claim. To see this, suppose there were a sea battle. Call this claim S. In each of our systems we can prove as a theorem that (S → ◊S), which by definition means that (S → ¬□¬S). (Surely it is a good thing that we can derive this; it would be absurd to say that what is, is not possible.) It is a theorem of propositional logic that (S v ¬S), and so by necessitation we would have □(S v ¬S). Then, with few applications of modus ponens and modal logic, we would have □S.

1. (□(Sv¬S) → (□Sv□¬S))	the problematic premise
2. S	premise
3. (S v ¬S)	theorem
4. □(S v ¬S)	necessitation, 3
5. (□S v □¬S)	modus ponens, 1, 4
6. (S → ¬□¬S)	theorem
7. ¬□¬S	modus ponens 6, 2
8. □S	modus tollendo ponens, 5, 7

But that seems wrong. It was possible, after all, that there would not be a sea battle. Worse, this reason could be applied to any claim. If some claim is true, then it will be necessary. So Aristotle was right to deny the claim that (□(Φ v ¬Φ) → (□Φ v □¬Φ)), and fortunately this is not a theorem of any of our standard modal logic systems.

Although we agree with Aristotle here, we may disagree with him on another point. Aristotle says that what is, is necessary. That is not a claim in any of our logical systems, and it seems right to keep it out. Most likely, Aristotle mixed notions of time and modality together, in a way that, at least from our perspective, seems confusing and perhaps likely to lead to errors. That is, he seems to have thought that once something is true now, that makes it necessary; as if some feature of time determined necessity.

Here is an interesting case from contemporary metaphysics where the axiom (M5) plays an important role. The philosopher David Chalmers (1966-) has argued that contemporary science will not be able to explain the character of consciousness—that is, nothing like contemporary science will explain what it is like to taste a strawberry or listen to Beethoven.^[17] This is not an anti-science argument; rather, his point is that contemporary science does not have the tools required, and will ultimately require radical new additional theory and methods. That has happened before: for example, at one time many philosophers hoped to explain all of nature in terms of the motion and impact of small, rigid particles. This theory, called “atomism”, had great trouble accounting for gravity and magnetism. Ultimately, we added fields to our physical theories. From the perspective of an atomist, this would have meant the overthrow of their view: one needed radical new additions to atomism to explain these phenomena.

Chalmers predicts a similar kind of revolution is required for progress in the study of consciousness. However, one of his primary arguments is very controversial. It goes like this (I revise the argument slightly, to avoid some technical terminology):

1. We can conceive of beings that are physically identical to us but that lack conscious experiences.

2. If we can conceive of beings that are physically identical to us and that lack conscious experiences, then it is possible that there are beings that are physically identical to us but that lack conscious experiences.

3. If it is possible that there are beings that are physically identical to us and that lack conscious experiences, then physical sciences alone cannot explain conscious experience.

4. The physical sciences alone cannot explain conscious experience.

This argument is obviously valid; it requires only two applications of modus ponens to show line 4.

But is the argument sound? This is very important. Many people study the human mind, and many more people would benefit if we better understood the mind. Understanding consciousness would seem to be an important part of that. So, psychologists, psychiatrists, philosophers, artificial intelligence researchers, and many others should care. But, if this argument is sound, we should start spending grant money and time and other resources on radical new methods and approaches, if we want to understand consciousness.

Many philosophers have denied premise 1 of this argument. Those philosophers argue that, although it is easy to just say, “I can conceive of beings that are physically identical to us but that lack conscious experiences”, if I really thought hard about what this means, I would find it absurd and no longer consider it conceivable. After all, imagine that the people weeping next to you at a funeral feel nothing, or that a person who has a severe wound and screams in pain feels nothing. This is what this argument claims is possible: people could exist who act at all times, without exception, as if they feel pain and sadness and joy and so on, but who never do. Perhaps, if we think that through, we’ll say it is not really conceivable.

Premise 2 is very controversial, since it seems that we might fool ourselves into thinking we can conceive of something that is not possible. It seems to mix human capabilities, and thus subjective judgments, with claims about what is objectively possible.

Premise 3 turns on technical notions about what it means for a physical science to explain a phenomenon; suffice it to say most philosophers agree with premise 3.

But what is interesting is that something appears to be wrong with the argument if we adopt the S5 axiom, (M5). In particular, it is a theorem of modal logic system S5 that (□Φ ↔ ◊□Φ). With this in mind, consider the following argument, which makes use of key claims in Chalmers’s argument.

1. We can conceive of it being true that necessarily a being physically identical to us will have the same conscious experience as us.

2. If we can conceive of it being true that necessarily a being physically identical to us will have the same conscious experience as us, then it is possible that necessarily a being physically identical to us will have the same conscious experience as us.

3. It is possible that necessarily a being physically identical to us will have the same conscious experience as us.

4. If it is possible that necessarily a being physically identical to us will have the same conscious experience as us, then necessarily a being physically identical to us will have the same conscious experience as us.

5. Necessarily a being physically identical to us will have the same conscious experience as us.

Premise 1 of this argument seems plausible; surely it is at least as plausible as the claim that we can conceive of a being physically identical to us that does not have phenomenal experience. If we can imagine those weird people who act like they feel, but do not feel, then we can also imagine that whenever people act and operate like us, they feel as we do.

Premise 2 uses the same reasoning as premise 2 of Chalmers’ argument: what is conceivable is possible.

Line 3 is introduced just to clarify the argument; it follows from modus ponens of premises 2 and 1.

Line 4 uses (◊□Φ→ □Φ), which we can derive in S5.

The conclusion, line 5, follows by modus ponens from 4 and 3. The conclusion is what those in this debate agree would be sufficient to show that the physical sciences can explain the phenomenon (via the same reasoning that went into line 3 of Chalmers’s argument).

If we accept modal system S5, then it seems that there is something wrong with Chalmers’s argument, since the kind of reasoning used by the argument can lead us to contradictory conclusions. Defenders of Chalmers’s argument must either reject axiom (M5), or deny line 1 of this new argument. As I noted above, this matters because we would like to know how best to proceed in understanding the mind; whether either of these arguments is sound would help determine how best we can proceed.

17.8 An example of another axiom system: Peano arithmetic

We close this brief introduction to the axiomatic approach with a look at how we can extend the first order logic to enable us to study a familiar specific task: arithmetic.

We have mentioned Gottlob Frege several times in this text. This mathematician and philosopher made essential contributions to first order logic, the philosophy of mathematics, and the philosophy of language. His primary goal in all his work, however, was to prove that mathematics was an extension of logic. This view was called “logicism”.

Why would anyone be interested in such a project? When Frege was writing, there were many disagreements about some fundamental issues in mathematics. These included the status of set theory, which many found very useful, but others disliked. Frege’s proposed solution was very elegant. If we could derive arithmetic from a handful of logical axioms that everyone would find unobjectionable, then we would have a way to settle all disputes about arithmetic. To put a controversial claim beyond criticism, we would have to show that we could derive it directly from our unobjectionable axioms.

Unfortunately, Frege’s project failed. His ultimate axiomatization of arithmetic proved to have a fatal flaw. While Frege had the final volume of this project at the press, he received a postcard from a young philosopher, Bertrand Russell. Russell was attempting the same project as Frege, but he had noticed a problem. In Frege’s system, Russell was able to derive a contradiction. This was devastating: just as in our logic, in Frege’s logic you could derive anything from a contradiction. And so this meant Frege’s system was inconsistent: it could prove falsehoods.

What Russell proved is now called “Russell’s Paradox”. It would be better called “Russell’s Contradiction”, since it is explicitly a contradiction. The logical systems that Russell and Frege used had special entities called “classes”, but given that most people today are more familiar with sets, we can restate Russell’s Contradiction with sets. Russell showed that in Frege’s system he could derive a sentence that meant something like: there is a set of all sets that do not belong to themselves.

Think about this for a moment. Does this set belong to itself? If it does, it should not, because it is supposedly the set of all and only those things that do not belong to themselves. Thus, it should not belong to itself. But then, it would be one of those sets that do not belong to themselves, and so it should belong to itself. We have a contradiction.

Frege was devastated. From our perspective, this is unfortunate, because his accomplishments were already so great that he need not have felt this error was a refutation of all his work.

Logicism did not die. Russell found a way to avoid the contradiction using a kind of logical system called “type theory”, and he pursued the construction of arithmetic in his huge book, written with Alfred Whitehead, Principia Mathematica. This book’s system is, unfortunately, much more cumbersome than Frege’s system, and earned few converts to logicism. Today, however, there are still mathematicians who believe that type theory may be the best way to understand mathematics.

More often, however, treatments of the foundations of arithmetic turn directly to an axiomatization that includes explicitly mathematical elements. The most familiar of these is typically called “the Peano axioms”, although these are significantly modified from the version of Giuseppe Peano (1858-1932) after whom they are named. Peano himself cites Richard Dedekind (1831-1916) as his source. A standard version of Peano’s axioms is surprisingly compact, and proves sufficient to do all the things we want arithmetic to do. The axioms, however, clearly assume and specify things about numbers and operations on numbers. We sometimes call such axioms “non-logical axioms” or “proper axioms”, not because there is something irrational about them, but because they clearly aim to capture some rich notion that may appear to be more specific than a highly general logical notion.

Here is one possible set of axioms for arithmetic, restricted to addition and multiplication.

We assume all the axioms of the first order logic described in section 17.6 (that is, we have (L1) through (L5)). We have a single special term, 0. We have three special function letters in our language, f1 and f2, which are arity two, and f3, which is arity one. f1 and f2 are to be given an interpretation consistent with addition and multiplication. f3 is the successor function; think of it as adding one to any number. We have an arity-two predicate P1, which we interpret as we would “=”. However, for reasons of familiarity, we will write “+” and “∙” and “=”. For the successor function we will use “′”. Thus we write the successor of 0 by writing 0′. The successor of 0 is of course 1. The successor of 1 is 2, expressed 0′′, and so on.

With all that in mind, one such axiom system is:

(A1)	(x1 = x2 → x1′ = x2′)
(A2)	(x1′ = x2′ → x1 = x2)
(A3)	(x1 = x2 → (x1 = x3 → x2 = x3))
(A4)	¬ x1′ = 0
(A5)	x1 + 0 = x1
(A6)	x1 + x2′ = (x1 + x2)′
(A7)	x1 ∙ 0 = 0
(A8)	x1 ∙ x2′ = (x1 ∙ x2) + x1

(Remember that “+” and “∙” are used here only as shorthand for our functions; in our object language we do not have infix notation and we do not need the disambiguating parentheses shown here for axioms (A6) and (A8). That is, axiom (A6), for example, is actually P1f1x1f3x2f3f1x1x2 . Looking at that sentence, you will no doubt realize why it is very helpful to switch over to our usual way of writing these things.)

We can also make mathematical induction an explicit part of this system (instead of a principle of our metalogical reasoning alone), and include it as an axiom:

(A9) (Φ(0) → (∀xi(Φ(xi) →Φ(xi′)) →∀xiΦ(xi)))

These axioms are sufficient to do everything that we expect of arithmetic. This is quite remarkable, because arithmetic is very powerful and flexible, and these axioms are few and seemingly simple and obvious. Since these explicitly formulate our notions of addition, multiplication, and numbers, they do not achieve what Frege dreamed; he hoped that the axioms would be more general, and from them one would derive things like addition and multiplication. But this is still a powerful demonstration of how we can reduce great disciplines of reasoning to compact fundamental principles.

Axiomatizations like this one have allowed us to study and discover shocking things about arithmetic, the most notable being the discovery by the logician Kurt Godel (1906-1978) that arithmetic is either incomplete or inconsistent.

In closing our discussion of axiomatic systems, we can use this system to prove that 1+1=2. We will use the indiscernibility of identicals rule introduced in 15.3. We start by letting x1 be 0′ and x2 be 0 to get the instance of axiom (A6) on line 1, and the instance of (A5) on line 2.

1.	0′ + 0′ = (0′ + 0)′	axiom (A6)
2.	0′ + 0 = 0′	axiom (A5)
3.	0′ + 0′ = 0′′	indiscernibility of indenticals, 1, 2

Because line 2 tells us that 0′ + 0 and 0′ are the same, using indiscernibility of identicals we can substitute 0′ for 0′ + 0 in the right hand side of the identity in line 1. Remember that the parentheses are there only to disambiguate our shorthand; this means (0′)′ and 0′′ are actually the same.

17.9 Problems

Use a truth table to show that the following sentences are equivalent (using your natural deduction system definitions of the connectives, as given in part I of the book).
1. (Φ^Ψ) and ¬(Φ→¬Ψ).
2. (ΦvΨ) and (¬Φ→Ψ).
3. (Φ ↔ Ψ) and ¬((Φ→Ψ)→¬(Ψ→Φ)).
Prove the following arguments are valid, using only our axioms of the axiomatic propositional logic, modus ponens, and the deduction theorem. For some of these problems, you will need to translate some of the sentences as required into the corresponding formula with only negation and conditional as our connectives (follow the interpretation you studied in problem 1).
1. {(P1^ P2)} |― P1
2. {(¬P2, (P1→P2)}|― ¬P1
3. {(P1 v P2), ¬P2}|― P1
4. {(P 1 → (P 2 →P 3 ))}|― (P 2 → (P 1 →P 3 ))
Describe a function that is on A and onto B and also 1-to-1, so that we know that |A| = |B|, for each of the following. Your function can be stated in informal mathematical notation. Remember that the natural numbers are {1, 2, 3…} and the integers are all the whole numbers, both negative and positive, and including 0.
1. A is the even natural numbers, B is the odd natural numbers.
2. A is the natural numbers, and B is the numbers {10, 11, 12…}.
3. A is the natural numbers, B is the negative integers.
Prove the following claims. Your proof can use the definitions given in 17.5 and any theorems of our first order logic, and can be stated in English.
1. For any sets A and B, ({A}={B} ↔ A=B).
2. For any sets A and B, (A⊆B ↔ A∩B=A).
3. For any sets A and B, (A⊆B ↔ A∪B=B).
4. For any set A, (A∩⌀=⌀).
Prove the following, keeping as resources the propositional logic and the specified modal system.
1. In M: (Φ→◊Φ)
2. In S4: (□Φ↔□□Φ).
3. In S4: (◊Φ↔◊◊Φ)
4. In S5: (□Φ↔◊□Φ).
For these problems and the next, you will find they are much easier if you allow yourself two rules sometimes added to natural deduction systems. These are: (1) Contraposition, which allows one to assert that (¬Ψ→¬Φ) given (Φ→Ψ); and (2) a relaxing of double negation that allows one to remove any double negation from anywhere in a formula, or insert any double negation anywhere in a formula as long as the result is well formed.
Show that S5 is equivalent to the combination of Brouwer and S4. You can do this by showing that from the axioms of Brouwer ((M1), (M2), and (M3)) and S4 (axiom (M4)) you can derive the additional axioms of S5 (in this case, just (M5)). Then show that from the axioms of S5 ((M1), (M2), and (M5)) you can derive the additional axioms of Brouwer and S4 (that is, (M3) and (M4)).
Prove the following theorems, using only our axioms of the axiomatic first order logic, modus ponens, and the deduction theorem.
1. ∀ x 1 ( P 1 ( x 1 ) → P 2 ( x 1 )) → ( ∀ x 1 P 1 ( x 1 ) →∀ x 1 P 2 ( x 1 ))
2. ∀ x 1 ( ¬P 2 ( x 1 ) → ¬P 1 ( x 1 )) → ∀ x 1 ( P 1 ( x 1 ) → P 2 ( x 1 ))
3. ∀ x 1 (P 1 ( x 1 ) →¬P 2 ( x 1 ) ) → ∀x(P 2 ( x 1 ) → ¬P 1 ( x 1 ) )
4. ∀x1¬(P1(x1) → ¬P2(x1)) → ∀x1P1(x1)
Using only first order logic, the Peano axioms, and indiscernibility of identicals, prove that
1. 1+0 = 0+1
2. 2+2 = 4
3. 2∙2 = 4
4. 2∙2 = 2+2

[13] A good source is (Mendelson 2010: 32-34).

[14] See Tarski (1956).

[15] This translation is based on that in (Barnes 1984: 29).

[16] See Carnap (1956).

[17] See Chalmers (1996).